Since the Industrial revolution, humanity has been using fossil fuels to generate electricity. Unfortunately we cannot continue this trend anymore because of Climate Change. We have already emitted so much C02 that we have raised Earth's average temperature by 1.1 degree Celsius.
Since we all know that electricity is the basis of modern life and without it we cannot survive, one of the most important question which we are looking to answer is whether Non Fossils fuels and renewable sources of electricity can meet our electricity power demands 100% by 2050.
We would like to know what goes behind the scenes in order to generate electricity and observe the changes in method of electricity generation trends over a certain time period for almost all of the regions on our planet.
For our project, we have classified Coal, Natural Gas and Oil as dirty fuel sources (because the emit a lot of CO2). Nuclear, Hydro, Wind and Solar have been deemed as Eco-Friendly source because it does not emit any CO2 during its operation.
The main domain of interest for our group is “Sources of Electricity generation”. The reason why we are only focusing only on Electricity aspect is because we utilise more than 1/3rd of our Energy consumption just for Electricity generation.
Reference: - https://www.eia.gov/totalenergy/data/monthly/pdf/flow/total-energy-spaghettichart-2021.pdf
Made By: - Swarnim Khosla
Topic: - Analysis of different methods of electricity generation on the basis of various countries
Dataset Responsibility: - Electricity produced through both Eco-Friendly and CO2 intensive sources of generation.
Note: - Visualisations at the end (Section 10).
Skills demonstarted in the project: - SQL, Python (SQLAlchemy, Pandas, Plotly)
Source: - https://ourworldindata.org/grapher/nuclear-energy-generation?tab=table&country=~OWID_WRL
Source: - https://ourworldindata.org/grapher/hydropower-consumption?tab=table
Source: - https://ourworldindata.org/grapher/wind-generation?tab=table
Source: - https://ourworldindata.org/grapher/solar-energy-consumption?tab=table
Source: - https://ourworldindata.org/grapher/electricity-coal?tab=table
Source: - https://ourworldindata.org/grapher/electricity-gas?tab=table
Source: - https://ourworldindata.org/grapher/electricity-oil?tab=table
# Importing necessary libraries
import pandas as pd
import sqlalchemy as sq
# Reading Nuclear.csv file
df_nuclear = pd.read_csv("nuclear-energy-generation.csv")
#display(df_nuclear)
# Connecting to MySQL Database now.
engine = sq.create_engine('mysql+mysqlconnector://swarnim_khosla:9LPD90DMX@datasciencedb.ucalgary.ca/swarnim_khosla')
# Writing the Nuclear dataframe into a table
# Reference : - https://www.w3resource.com/pandas/dataframe/dataframe-to_sql.php
df_nuclear.to_sql("nuclear", engine, if_exists = 'replace', index=False, method='multi') # Pushes all data to MySql
# Demonstartation of the fact that import has been successful.
df_nuclear_table = pd.read_sql_table("nuclear", engine) # Reading the data.
df_nuclear_table.head(10)
| Entity | Code | Year | Electricity from nuclear (TWh) | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 2000 | 0.0 |
| 1 | Afghanistan | AFG | 2001 | 0.0 |
| 2 | Afghanistan | AFG | 2002 | 0.0 |
| 3 | Afghanistan | AFG | 2003 | 0.0 |
| 4 | Afghanistan | AFG | 2004 | 0.0 |
| 5 | Afghanistan | AFG | 2005 | 0.0 |
| 6 | Afghanistan | AFG | 2006 | 0.0 |
| 7 | Afghanistan | AFG | 2007 | 0.0 |
| 8 | Afghanistan | AFG | 2008 | 0.0 |
| 9 | Afghanistan | AFG | 2009 | 0.0 |
What does each column mean?
1) Entity column is basically a "List of Countries" column. 2) Code column tells us the 3 Character code of each country. If a row does not have Code, it means that Entity is NOT a country but a specific region on Earth. 3) Year column tells us the specific year for which that paticular data was collected. 4) Electricity from nuclear (TWh) gives us the electricity generated from Nuclear power in Terra-Watt Hours.
# Checking those records where 'Code' is NULL.
# Query_1
query_table = pd.read_sql_query('''SELECT DISTINCT `Entity` FROM nuclear
WHERE `Code` IS NULL;''', engine)
display(query_table)
| Entity | |
|---|---|
| 0 | Africa |
| 1 | Africa (BP) |
| 2 | Asia |
| 3 | Asia Pacific (BP) |
| 4 | CIS (BP) |
| 5 | Central America (BP) |
| 6 | Eastern Africa (BP) |
| 7 | Europe |
| 8 | Europe (BP) |
| 9 | European Union (27) |
| 10 | G20 (Ember) |
| 11 | G7 (Ember) |
| 12 | High-income countries |
| 13 | Latin America and Caribbean (Ember) |
| 14 | Low-income countries |
| 15 | Lower-middle-income countries |
| 16 | Middle Africa (BP) |
| 17 | Middle East (BP) |
| 18 | Middle East (Ember) |
| 19 | Non-OECD (BP) |
| 20 | North America |
| 21 | North America (BP) |
| 22 | OECD (BP) |
| 23 | OECD (Ember) |
| 24 | Oceania |
| 25 | Other Africa (BP) |
| 26 | Other Asia Pacific (BP) |
| 27 | Other CIS (BP) |
| 28 | Other Caribbean (BP) |
| 29 | Other Europe (BP) |
| 30 | Other Middle East (BP) |
| 31 | Other Northern Africa (BP) |
| 32 | Other South America (BP) |
| 33 | Other South and Central America (BP) |
| 34 | Other Southern Africa (BP) |
| 35 | South America |
| 36 | South and Central America (BP) |
| 37 | Upper-middle-income countries |
| 38 | Western Africa (BP) |
As we can see above, we are given the list of all the Entities which are not countries.
For our project we are only concerned with Countries and not regions. Therefore we shall delete these items.
# Query_2
engine.execute('''SET SQL_SAFE_UPDATES = 0;''')
# Here we have disabled 'Safe Mode' in MySQL in order to delete those regions.
# Reference: - https://linuxhint.com/mysql-error-code-1175-during-update-mysql/
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff4384254d0>
# Query_3
engine.execute('''DELETE FROM nuclear WHERE `CODE` IS NULL;''')
# Here we are deleting all the rows whose Entity was a Region and not a Country.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bc9e950>
# We shall now confirm that those records have been deleted where Entity IS NULL.
# Query_4
query_table = pd.read_sql_query('''SELECT DISTINCT(`Code`), `Entity` FROM nuclear;''', engine)
display(query_table)
| Code | Entity | |
|---|---|---|
| 0 | AFG | Afghanistan |
| 1 | ALB | Albania |
| 2 | DZA | Algeria |
| 3 | ASM | American Samoa |
| 4 | AGO | Angola |
| ... | ... | ... |
| 212 | ESH | Western Sahara |
| 213 | OWID_WRL | World |
| 214 | YEM | Yemen |
| 215 | ZMB | Zambia |
| 216 | ZWE | Zimbabwe |
217 rows × 2 columns
Each country is supposed to have a 3 letter Code attached to it.
We can notice that there are some Entities which have more than 3 letter code.
Let us explore that further.
# Query_5
query_table = pd.read_sql_query('''SELECT `Entity`, `Code`, CHAR_LENGTH(`CODE`) AS Length
FROM nuclear
GROUP BY `Entity`
HAVING CHAR_LENGTH(`CODE`) <> 3;''', engine)
# In this SQL query we are looking out for all records where Character length of 'Code' column is not equal to 3.
display(query_table)
| Entity | Code | Length | |
|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 8 |
| 1 | USSR | OWID_USS | 8 |
| 2 | World | OWID_WRL | 8 |
Here we can see that there are 3 peculiar records in our table. Let us see all of these records in its entirety.
# Query_6
query_table = pd.read_sql_query('''SELECT *
FROM nuclear
WHERE `Entity` IN ('Kosovo', 'USSR', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity from nuclear (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 0.000000 |
| 1 | Kosovo | OWID_KOS | 2001 | 0.000000 |
| 2 | Kosovo | OWID_KOS | 2002 | 0.000000 |
| 3 | Kosovo | OWID_KOS | 2003 | 0.000000 |
| 4 | Kosovo | OWID_KOS | 2004 | 0.000000 |
| ... | ... | ... | ... | ... |
| 93 | World | OWID_WRL | 2017 | 2637.234619 |
| 94 | World | OWID_WRL | 2018 | 2700.464111 |
| 95 | World | OWID_WRL | 2019 | 2796.354004 |
| 96 | World | OWID_WRL | 2020 | 2693.978516 |
| 97 | World | OWID_WRL | 2021 | 2800.267822 |
98 rows × 4 columns
As we can see above, the result set for Kosovo, USSR and the World is given to us.
We would be deleting all of the World records because we are interested in countries and not in specific regions.
We would be deleting all of the records for USSR, because it no longer exists as a real Entity and is therefore no longer relevant. It is also to be noted that records for all post-soviet countries like Russia, Ukraine etc. are given in the table.
We would be deleting all of the records for Kosovo because it is an illegitimate state which is only recognised by 50% of the total countries in UN today. Also the records for Kosovo are useless becuase it shows '0' for all of the years.
Reference: - https://en.wikipedia.org/wiki/International_recognition_of_Kosovo
# Query_7
engine.execute('''DELETE FROM nuclear WHERE `Entity` IN ('Kosovo', 'USSR', 'World');''')
# Here we are deleting all the rows for Kosovo, USSR and World.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bc9eb50>
Now we shall look at the list of all of the unique countries who atleast generated some energy from nuclear.
# Query_8
# The query given below shall help us give distinct Entites and the sum of electricity they have generated from Nuclear power.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Entity`), SUM(`Electricity from nuclear (TWh)`) AS Total_Nuclear_Energy
FROM nuclear
GROUP BY `Entity`
HAVING Total_Nuclear_Energy <> 0
ORDER BY Total_Nuclear_Energy DESC;''', engine)
display(query_table)
| Entity | Total_Nuclear_Energy | |
|---|---|---|
| 0 | United States | 30585.713050 |
| 1 | France | 15084.615879 |
| 2 | Japan | 7974.783257 |
| 3 | Germany | 5608.414777 |
| 4 | Russia | 5536.170525 |
| 5 | South Korea | 4061.015937 |
| 6 | Canada | 3640.816111 |
| 7 | United Kingdom | 3380.199711 |
| 8 | China | 3103.685854 |
| 9 | Ukraine | 2900.260307 |
| 10 | Sweden | 2729.684004 |
| 11 | Spain | 2188.061997 |
| 12 | Belgium | 1738.848128 |
| 13 | Taiwan | 1400.442735 |
| 14 | Switzerland | 1055.958107 |
| 15 | Finland | 897.833579 |
| 16 | India | 759.094873 |
| 17 | Czechia | 749.470882 |
| 18 | Bulgaria | 666.009738 |
| 19 | Slovakia | 557.227974 |
| 20 | Hungary | 525.662389 |
| 21 | South Africa | 430.291888 |
| 22 | Brazil | 335.832116 |
| 23 | Lithuania | 301.926648 |
| 24 | Argentina | 294.219996 |
| 25 | Mexico | 292.558012 |
| 26 | Romania | 224.006400 |
| 27 | Netherlands | 184.997708 |
| 28 | Slovenia | 171.910687 |
| 29 | Pakistan | 110.190269 |
| 30 | Italy | 89.892000 |
| 31 | Iran | 49.973792 |
| 32 | Armenia | 49.240000 |
| 33 | United Arab Emirates | 12.168030 |
| 34 | Belarus | 6.118000 |
| 35 | Kazakhstan | 2.488122 |
There are only 36 countries! which has ever generated electricity from eco-friendly Nuclear Power.
Now I would like to know the duration and number of years that a specific country used Nuclear to generate electricity.
# Query_9
# The query given below shall tell us the starting year (oldest year) and the latest year (that is available in the dataset) of nuclear energy production.
query_table = pd.read_sql_query('''SELECT DISTINCT (`Entity`), MIN(`Year`) AS Oldest_year, MAX(`YEAR`) AS Latest_year, MAX(`YEAR`) - MIN(`Year`) AS Duration
FROM nuclear
WHERE `Electricity from nuclear (TWh)` <> 0
GROUP BY `Entity`
ORDER BY Duration DESC, SUM(`Electricity from nuclear (TWh)`) DESC;''', engine)
display(query_table)
| Entity | Oldest_year | Latest_year | Duration | |
|---|---|---|---|---|
| 0 | United States | 1965 | 2021 | 56 |
| 1 | France | 1965 | 2021 | 56 |
| 2 | Japan | 1965 | 2021 | 56 |
| 3 | Germany | 1965 | 2021 | 56 |
| 4 | Canada | 1965 | 2021 | 56 |
| 5 | United Kingdom | 1965 | 2021 | 56 |
| 6 | Sweden | 1965 | 2021 | 56 |
| 7 | Belgium | 1966 | 2021 | 55 |
| 8 | Spain | 1968 | 2021 | 53 |
| 9 | Netherlands | 1968 | 2021 | 53 |
| 10 | Switzerland | 1969 | 2021 | 52 |
| 11 | India | 1969 | 2021 | 52 |
| 12 | Pakistan | 1971 | 2021 | 50 |
| 13 | Slovakia | 1972 | 2021 | 49 |
| 14 | Bulgaria | 1974 | 2021 | 47 |
| 15 | Argentina | 1974 | 2021 | 47 |
| 16 | South Korea | 1977 | 2021 | 44 |
| 17 | Taiwan | 1977 | 2021 | 44 |
| 18 | Finland | 1977 | 2021 | 44 |
| 19 | Czechia | 1983 | 2021 | 38 |
| 20 | Hungary | 1983 | 2021 | 38 |
| 21 | South Africa | 1984 | 2021 | 37 |
| 22 | Brazil | 1984 | 2021 | 37 |
| 23 | Russia | 1985 | 2021 | 36 |
| 24 | Ukraine | 1985 | 2021 | 36 |
| 25 | Mexico | 1989 | 2021 | 32 |
| 26 | Slovenia | 1990 | 2021 | 31 |
| 27 | China | 1993 | 2021 | 28 |
| 28 | Romania | 1996 | 2021 | 25 |
| 29 | Lithuania | 1985 | 2009 | 24 |
| 30 | Italy | 1965 | 1987 | 22 |
| 31 | Armenia | 2000 | 2021 | 21 |
| 32 | Iran | 2011 | 2021 | 10 |
| 33 | Kazakhstan | 1989 | 1998 | 9 |
| 34 | United Arab Emirates | 2020 | 2021 | 1 |
| 35 | Belarus | 2020 | 2021 | 1 |
The answer given above is a little bit inaccurate in the sense that the oldest year record available in the .csv file is only from the year 1965.
This may give a false sense that for example United States started producing electricity from Nuclear from year 1965 even though in reality it started doing so from the year 1957.
Reference: - https://en.wikipedia.org/wiki/Nuclear_power_in_the_United_States#Start_of_commercial_nuclear_power
We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.
# Query_10
engine.execute('''ALTER TABLE nuclear RENAME COLUMN Entity TO Country;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bc8d090>
# Query_11
# Checking whether the column name has been changed or not.
query_table = pd.read_sql_query('''DESCRIBE nuclear;''', engine)
display(query_table)
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 0 | Country | text | YES | None | ||
| 1 | Code | text | YES | None | ||
| 2 | Year | bigint(20) | YES | None | ||
| 3 | Electricity from nuclear (TWh) | double | YES | None |
We can see that our column name has been indeed changed from Entity to Country.
Dear reader, I shall be executing the same 11 queries executed above for the Hydro Dataset and the rest of the energy-datasets as well. New queries are expected to come from Section 8 where I would be taking out Clean energy % by including the Energy.csv.
Skip to section 8, but do look at Inference in order to see different results specific to Hydro Dataset.
# Reading hydropower-consumption.csv file
df_hydro = pd.read_csv("hydropower-consumption.csv")
#display(df_hydro)
# Writing the Hydro dataframe into a table
# Reference : - https://www.w3resource.com/pandas/dataframe/dataframe-to_sql.php
df_hydro.to_sql("hydro", engine, if_exists = 'replace', index=False, method='multi') # Pushes all data to MySql
# Demonstartation of the fact that import has been successful.
df_hydro_table = pd.read_sql_table("hydro", engine) # Reading the data.
df_hydro_table.head(10)
| Entity | Code | Year | Electricity from hydro (TWh) | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 2000 | 0.31 |
| 1 | Afghanistan | AFG | 2001 | 0.50 |
| 2 | Afghanistan | AFG | 2002 | 0.56 |
| 3 | Afghanistan | AFG | 2003 | 0.63 |
| 4 | Afghanistan | AFG | 2004 | 0.56 |
| 5 | Afghanistan | AFG | 2005 | 0.59 |
| 6 | Afghanistan | AFG | 2006 | 0.64 |
| 7 | Afghanistan | AFG | 2007 | 0.75 |
| 8 | Afghanistan | AFG | 2008 | 0.54 |
| 9 | Afghanistan | AFG | 2009 | 0.78 |
The columns of Hydro.csv are similar to that of nuclear.csv. For their interpretation please look before Query 1.
# Query_12
# Checking those records where 'Code' is NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT `Entity` FROM hydro
WHERE `Code` IS NULL;''', engine)
display(query_table)
| Entity | |
|---|---|
| 0 | Africa |
| 1 | Africa (BP) |
| 2 | Asia |
| 3 | Asia Pacific (BP) |
| 4 | CIS (BP) |
| 5 | Central America (BP) |
| 6 | Eastern Africa (BP) |
| 7 | Europe |
| 8 | Europe (BP) |
| 9 | European Union (27) |
| 10 | G20 (Ember) |
| 11 | G7 (Ember) |
| 12 | High-income countries |
| 13 | Latin America and Caribbean (Ember) |
| 14 | Low-income countries |
| 15 | Lower-middle-income countries |
| 16 | Middle Africa (BP) |
| 17 | Middle East (BP) |
| 18 | Middle East (Ember) |
| 19 | Non-OECD (BP) |
| 20 | North America |
| 21 | North America (BP) |
| 22 | OECD (BP) |
| 23 | OECD (Ember) |
| 24 | Oceania |
| 25 | Other Africa (BP) |
| 26 | Other Asia Pacific (BP) |
| 27 | Other CIS (BP) |
| 28 | Other Caribbean (BP) |
| 29 | Other Europe (BP) |
| 30 | Other Middle East (BP) |
| 31 | Other Northern Africa (BP) |
| 32 | Other South America (BP) |
| 33 | Other South and Central America (BP) |
| 34 | Other Southern Africa (BP) |
| 35 | South America |
| 36 | South and Central America (BP) |
| 37 | Upper-middle-income countries |
| 38 | Western Africa (BP) |
For our project we are only concerned with Countries and not regions. Therefore we shall delete these items.
# Query_13
engine.execute('''DELETE FROM hydro WHERE `CODE` IS NULL;''')
# Here we are deleting all the rows whose Entity was a Region and not a Country.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff4382bb4d0>
# Query_14
# We shall now confirm that those records have been deleted where Entity IS NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Code`), `Entity` FROM hydro;''', engine)
display(query_table)
| Code | Entity | |
|---|---|---|
| 0 | AFG | Afghanistan |
| 1 | ALB | Albania |
| 2 | DZA | Algeria |
| 3 | ASM | American Samoa |
| 4 | AGO | Angola |
| ... | ... | ... |
| 212 | ESH | Western Sahara |
| 213 | OWID_WRL | World |
| 214 | YEM | Yemen |
| 215 | ZMB | Zambia |
| 216 | ZWE | Zimbabwe |
217 rows × 2 columns
From the list above, we can confirm that no NULL value is left in the table.
# Query_15
query_table = pd.read_sql_query('''SELECT `Entity`, `Code`, CHAR_LENGTH(`CODE`) AS Length
FROM hydro
GROUP BY `Entity`
HAVING CHAR_LENGTH(`CODE`) <> 3;''', engine)
# In this SQL query we are looking out for all records where Character length of 'Code' column is not equal to 3.
display(query_table)
| Entity | Code | Length | |
|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 8 |
| 1 | USSR | OWID_USS | 8 |
| 2 | World | OWID_WRL | 8 |
# Query_16
# Seeing those records in its entirety where Code length NOT EQUAL to 3.
query_table = pd.read_sql_query('''SELECT *
FROM hydro
WHERE `Entity` IN ('Kosovo', 'USSR', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity from hydro (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 0.050000 |
| 1 | Kosovo | OWID_KOS | 2001 | 0.050000 |
| 2 | Kosovo | OWID_KOS | 2002 | 0.050000 |
| 3 | Kosovo | OWID_KOS | 2003 | 0.050000 |
| 4 | Kosovo | OWID_KOS | 2004 | 0.110000 |
| ... | ... | ... | ... | ... |
| 93 | World | OWID_WRL | 2017 | 4069.982666 |
| 94 | World | OWID_WRL | 2018 | 4183.166504 |
| 95 | World | OWID_WRL | 2019 | 4231.376953 |
| 96 | World | OWID_WRL | 2020 | 4345.990234 |
| 97 | World | OWID_WRL | 2021 | 4273.827637 |
98 rows × 4 columns
# Query_17
engine.execute('''DELETE FROM hydro WHERE `Entity` IN ('Kosovo', 'USSR', 'World');''')
# Here we are deleting all the rows for Kosovo, USSR and World for the reasons as stated above.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bcbfb50>
# Query_18
# The query given below shall help us give distinct Entites and the sum of electricity they have generated from Hydro power.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Entity`), SUM(`Electricity from hydro (TWh)`) AS Total_Hydro_Energy
FROM hydro
GROUP BY `Entity`
HAVING Total_Hydro_Energy <> 0
ORDER BY Total_Hydro_Energy DESC;''', engine)
display(query_table)
| Entity | Total_Hydro_Energy | |
|---|---|---|
| 0 | China | 19810.906448 |
| 1 | Canada | 16948.565735 |
| 2 | United States | 15673.951126 |
| 3 | Brazil | 13023.484613 |
| 4 | Russia | 6362.678513 |
| ... | ... | ... |
| 149 | Estonia | 0.509754 |
| 150 | Saint Vincent and the Grenadines | 0.490000 |
| 151 | Sao Tome and Principe | 0.180000 |
| 152 | Turkmenistan | 0.131225 |
| 153 | Guadeloupe | 0.030000 |
154 rows × 2 columns
There are 154 distinct countries! which has generated electricity from eco-friendly Hydro Power, which is way higher than nuclear countries (in total of 36 countries).
# Query_19
# The query given below shall tell us the starting year (oldest year) and the latest year (that is available in the dataset) of hydro energy production.
query_table = pd.read_sql_query('''SELECT DISTINCT (`Entity`), MIN(`Year`) AS Oldest_year, MAX(`YEAR`) AS Latest_year, MAX(`YEAR`) - MIN(`Year`) AS Duration
FROM hydro
WHERE `Electricity from hydro (TWh)` <> 0
GROUP BY `Entity`
ORDER BY Duration DESC, SUM(`Electricity from hydro (TWh)`) DESC;''', engine)
display(query_table)
| Entity | Oldest_year | Latest_year | Duration | |
|---|---|---|---|---|
| 0 | China | 1965 | 2021 | 56 |
| 1 | Canada | 1965 | 2021 | 56 |
| 2 | United States | 1965 | 2021 | 56 |
| 3 | Brazil | 1965 | 2021 | 56 |
| 4 | Norway | 1965 | 2021 | 56 |
| ... | ... | ... | ... | ... |
| 149 | Senegal | 2002 | 2021 | 19 |
| 150 | Montenegro | 2005 | 2021 | 16 |
| 151 | Mongolia | 2010 | 2021 | 11 |
| 152 | Liberia | 2020 | 2020 | 0 |
| 153 | Guadeloupe | 2020 | 2020 | 0 |
154 rows × 4 columns
# Query_20
# We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.
engine.execute('''ALTER TABLE hydro RENAME COLUMN Entity TO Country;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff448391e10>
# Query_21
# Checking whether the column name has been changed or not.
query_table = pd.read_sql_query('''DESCRIBE hydro;''', engine)
display(query_table)
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 0 | Country | text | YES | None | ||
| 1 | Code | text | YES | None | ||
| 2 | Year | bigint(20) | YES | None | ||
| 3 | Electricity from hydro (TWh) | double | YES | None |
Skip to section 8, but do look at Inference in order to see different results specific to Wind Dataset.
# Reading wind-generation.csv file
df_wind = pd.read_csv("wind-generation.csv")
#display(df_wind)
# Writing the Wind dataframe into a table
# Reference : - https://www.w3resource.com/pandas/dataframe/dataframe-to_sql.php
df_wind.to_sql("wind", engine, if_exists = 'replace', index=False, method='multi') # Pushes all data to MySql
# Demonstartation of the fact that import has been successful.
df_wind_table = pd.read_sql_table("wind", engine) # Reading the data.
df_wind_table.head(10)
| Entity | Code | Year | Electricity from wind (TWh) | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 2000 | 0.0 |
| 1 | Afghanistan | AFG | 2001 | 0.0 |
| 2 | Afghanistan | AFG | 2002 | 0.0 |
| 3 | Afghanistan | AFG | 2003 | 0.0 |
| 4 | Afghanistan | AFG | 2004 | 0.0 |
| 5 | Afghanistan | AFG | 2005 | 0.0 |
| 6 | Afghanistan | AFG | 2006 | 0.0 |
| 7 | Afghanistan | AFG | 2007 | 0.0 |
| 8 | Afghanistan | AFG | 2008 | 0.0 |
| 9 | Afghanistan | AFG | 2009 | 0.0 |
# Query_22
# Checking those records where 'Code' is NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT `Entity` FROM wind
WHERE `Code` IS NULL;''', engine)
display(query_table)
| Entity | |
|---|---|
| 0 | Africa |
| 1 | Africa (BP) |
| 2 | Asia |
| 3 | Asia Pacific (BP) |
| 4 | CIS (BP) |
| 5 | Central America (BP) |
| 6 | Eastern Africa (BP) |
| 7 | Europe |
| 8 | Europe (BP) |
| 9 | European Union (27) |
| 10 | G20 (Ember) |
| 11 | G7 (Ember) |
| 12 | High-income countries |
| 13 | Latin America and Caribbean (Ember) |
| 14 | Low-income countries |
| 15 | Lower-middle-income countries |
| 16 | Middle Africa (BP) |
| 17 | Middle East (BP) |
| 18 | Middle East (Ember) |
| 19 | Non-OECD (BP) |
| 20 | North America |
| 21 | North America (BP) |
| 22 | OECD (BP) |
| 23 | OECD (Ember) |
| 24 | Oceania |
| 25 | Other Africa (BP) |
| 26 | Other Asia Pacific (BP) |
| 27 | Other CIS (BP) |
| 28 | Other Caribbean (BP) |
| 29 | Other Europe (BP) |
| 30 | Other Middle East (BP) |
| 31 | Other Northern Africa (BP) |
| 32 | Other South America (BP) |
| 33 | Other South and Central America (BP) |
| 34 | Other Southern Africa (BP) |
| 35 | South America |
| 36 | South and Central America (BP) |
| 37 | Upper-middle-income countries |
| 38 | Western Africa (BP) |
# Query_23
engine.execute('''DELETE FROM wind WHERE `CODE` IS NULL;''')
# Here we are deleting all the rows whose Entity was a Region and not a Country.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff44836cfd0>
# Query_24
# We shall now confirm that those records have been deleted where Entity IS NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Code`), `Entity` FROM wind;''', engine)
display(query_table)
| Code | Entity | |
|---|---|---|
| 0 | AFG | Afghanistan |
| 1 | ALB | Albania |
| 2 | DZA | Algeria |
| 3 | ASM | American Samoa |
| 4 | AGO | Angola |
| ... | ... | ... |
| 212 | ESH | Western Sahara |
| 213 | OWID_WRL | World |
| 214 | YEM | Yemen |
| 215 | ZMB | Zambia |
| 216 | ZWE | Zimbabwe |
217 rows × 2 columns
# Query_25
query_table = pd.read_sql_query('''SELECT `Entity`, `Code`, CHAR_LENGTH(`CODE`) AS Length
FROM wind
GROUP BY `Entity`
HAVING CHAR_LENGTH(`CODE`) <> 3;''', engine)
# In this SQL query we are looking out for all records where Character length of 'Code' column is not equal to 3.
display(query_table)
| Entity | Code | Length | |
|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 8 |
| 1 | USSR | OWID_USS | 8 |
| 2 | World | OWID_WRL | 8 |
# Query_26
# Seeing those records in its entirety where Code length NOT EQUAL to 3.
query_table = pd.read_sql_query('''SELECT *
FROM wind
WHERE `Entity` IN ('Kosovo', 'USSR', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity from wind (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 0.000000 |
| 1 | Kosovo | OWID_KOS | 2001 | 0.000000 |
| 2 | Kosovo | OWID_KOS | 2002 | 0.000000 |
| 3 | Kosovo | OWID_KOS | 2003 | 0.000000 |
| 4 | Kosovo | OWID_KOS | 2004 | 0.000000 |
| ... | ... | ... | ... | ... |
| 74 | World | OWID_WRL | 2017 | 1140.393066 |
| 75 | World | OWID_WRL | 2018 | 1269.979248 |
| 76 | World | OWID_WRL | 2019 | 1420.544067 |
| 77 | World | OWID_WRL | 2020 | 1596.428223 |
| 78 | World | OWID_WRL | 2021 | 1861.939819 |
79 rows × 4 columns
# Query_27
engine.execute('''DELETE FROM wind WHERE `Entity` IN ('Kosovo', 'USSR', 'World');''')
# Here we are deleting all the rows for Kosovo, USSR and World for the reasons as stated above.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bca3150>
# Query_28
# The query given below shall help us give distinct Entites and the sum of electricity they have generated from wind power.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Entity`), SUM(`Electricity from wind (TWh)`) AS Total_Wind_Energy
FROM wind
GROUP BY `Entity`
HAVING Total_Wind_Energy <> 0
ORDER BY Total_Wind_Energy DESC;''', engine)
display(query_table)
| Entity | Total_Wind_Energy | |
|---|---|---|
| 0 | China | 3207.577394 |
| 1 | United States | 2983.782956 |
| 2 | Germany | 1283.028996 |
| 3 | Spain | 810.899940 |
| 4 | India | 591.921919 |
| ... | ... | ... |
| 109 | Seychelles | 0.030000 |
| 110 | Uzbekistan | 0.015746 |
| 111 | Hong Kong | 0.013532 |
| 112 | Armenia | 0.010000 |
| 113 | United Arab Emirates | 0.005956 |
114 rows × 2 columns
There are 114 distinct countries! which has generated electricity from eco-friendly wind Power, which is way higher than nuclear countries (in total of 36 countries) and lower than Hydro (in total of 154 countries).
# Query_29
# The query given below shall tell us the starting year (oldest year) and the latest year (that is available in the dataset) of wind energy production.
query_table = pd.read_sql_query('''SELECT DISTINCT (`Entity`), MIN(`Year`) AS Oldest_year, MAX(`YEAR`) AS Latest_year, MAX(`YEAR`) - MIN(`Year`) AS Duration
FROM wind
WHERE `Electricity from wind (TWh)` <> 0
GROUP BY `Entity`
ORDER BY Duration DESC, SUM(`Electricity from wind (TWh)`) DESC;''', engine)
display(query_table)
| Entity | Oldest_year | Latest_year | Duration | |
|---|---|---|---|---|
| 0 | Denmark | 1978 | 2021 | 43 |
| 1 | United States | 1983 | 2021 | 38 |
| 2 | Sweden | 1983 | 2021 | 38 |
| 3 | Canada | 1985 | 2021 | 36 |
| 4 | Germany | 1986 | 2021 | 35 |
| ... | ... | ... | ... | ... |
| 109 | Seychelles | 2013 | 2015 | 2 |
| 110 | Uzbekistan | 2019 | 2021 | 2 |
| 111 | United Arab Emirates | 2013 | 2015 | 2 |
| 112 | El Salvador | 2021 | 2021 | 0 |
| 113 | Armenia | 2010 | 2010 | 0 |
114 rows × 4 columns
# Query_30
# We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.
engine.execute('''ALTER TABLE wind RENAME COLUMN Entity TO Country;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff43988e850>
# Query_31
# Checking whether the column name has been changed or not.
query_table = pd.read_sql_query('''DESCRIBE wind;''', engine)
display(query_table)
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 0 | Country | text | YES | None | ||
| 1 | Code | text | YES | None | ||
| 2 | Year | bigint(20) | YES | None | ||
| 3 | Electricity from wind (TWh) | double | YES | None |
Skip to section 8, but do look at Inference in order to see different results specific to Solar Dataset.
# Reading solar-energy-consumption.csv file
df_solar = pd.read_csv("solar-energy-consumption.csv")
#display(df_solar)
# Writing the Solar dataframe into a table
# Reference : - https://www.w3resource.com/pandas/dataframe/dataframe-to_sql.php
df_solar.to_sql("solar", engine, if_exists = 'replace', index=False, method='multi') # Pushes all data to MySql
# Demonstartation of the fact that import has been successful.
df_solar_table = pd.read_sql_table("solar", engine) # Reading the data.
df_solar_table.head(10)
| Entity | Code | Year | Electricity from solar (TWh) | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 2000 | 0.0 |
| 1 | Afghanistan | AFG | 2001 | 0.0 |
| 2 | Afghanistan | AFG | 2002 | 0.0 |
| 3 | Afghanistan | AFG | 2003 | 0.0 |
| 4 | Afghanistan | AFG | 2004 | 0.0 |
| 5 | Afghanistan | AFG | 2005 | 0.0 |
| 6 | Afghanistan | AFG | 2006 | 0.0 |
| 7 | Afghanistan | AFG | 2007 | 0.0 |
| 8 | Afghanistan | AFG | 2008 | 0.0 |
| 9 | Afghanistan | AFG | 2009 | 0.0 |
# Query_32
# Checking those records where 'Code' is NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT `Entity` FROM solar
WHERE `Code` IS NULL;''', engine)
display(query_table)
| Entity | |
|---|---|
| 0 | Africa |
| 1 | Africa (BP) |
| 2 | Asia |
| 3 | Asia Pacific (BP) |
| 4 | CIS (BP) |
| 5 | Central America (BP) |
| 6 | Eastern Africa (BP) |
| 7 | Europe |
| 8 | Europe (BP) |
| 9 | European Union (27) |
| 10 | G20 (Ember) |
| 11 | G7 (Ember) |
| 12 | High-income countries |
| 13 | Latin America and Caribbean (Ember) |
| 14 | Low-income countries |
| 15 | Lower-middle-income countries |
| 16 | Middle Africa (BP) |
| 17 | Middle East (BP) |
| 18 | Middle East (Ember) |
| 19 | Non-OECD (BP) |
| 20 | North America |
| 21 | North America (BP) |
| 22 | OECD (BP) |
| 23 | OECD (Ember) |
| 24 | Oceania |
| 25 | Other Africa (BP) |
| 26 | Other Asia Pacific (BP) |
| 27 | Other CIS (BP) |
| 28 | Other Caribbean (BP) |
| 29 | Other Europe (BP) |
| 30 | Other Middle East (BP) |
| 31 | Other Northern Africa (BP) |
| 32 | Other South America (BP) |
| 33 | Other South and Central America (BP) |
| 34 | Other Southern Africa (BP) |
| 35 | South America |
| 36 | South and Central America (BP) |
| 37 | Upper-middle-income countries |
| 38 | Western Africa (BP) |
# Query_33
engine.execute('''DELETE FROM solar WHERE `CODE` IS NULL;''')
# Here we are deleting all the rows whose Entity was a Region and not a Country.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bca3f10>
# Query_34
# We shall now confirm that those records have been deleted where Entity IS NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Code`), `Entity` FROM solar;''', engine)
display(query_table)
| Code | Entity | |
|---|---|---|
| 0 | AFG | Afghanistan |
| 1 | ALB | Albania |
| 2 | DZA | Algeria |
| 3 | ASM | American Samoa |
| 4 | AGO | Angola |
| ... | ... | ... |
| 212 | ESH | Western Sahara |
| 213 | OWID_WRL | World |
| 214 | YEM | Yemen |
| 215 | ZMB | Zambia |
| 216 | ZWE | Zimbabwe |
217 rows × 2 columns
# Query_35
query_table = pd.read_sql_query('''SELECT `Entity`, `Code`, CHAR_LENGTH(`CODE`) AS Length
FROM solar
GROUP BY `Entity`
HAVING CHAR_LENGTH(`CODE`) <> 3;''', engine)
# In this SQL query we are looking out for all records where Character length of 'Code' column is not equal to 3.
display(query_table)
| Entity | Code | Length | |
|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 8 |
| 1 | USSR | OWID_USS | 8 |
| 2 | World | OWID_WRL | 8 |
# Query_36
# Seeing those records in its entirety where Code length NOT EQUAL to 3.
query_table = pd.read_sql_query('''SELECT *
FROM solar
WHERE `Entity` IN ('Kosovo', 'USSR', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity from solar (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 0.000000 |
| 1 | Kosovo | OWID_KOS | 2001 | 0.000000 |
| 2 | Kosovo | OWID_KOS | 2002 | 0.000000 |
| 3 | Kosovo | OWID_KOS | 2003 | 0.000000 |
| 4 | Kosovo | OWID_KOS | 2004 | 0.000000 |
| ... | ... | ... | ... | ... |
| 74 | World | OWID_WRL | 2017 | 445.466278 |
| 75 | World | OWID_WRL | 2018 | 576.230896 |
| 76 | World | OWID_WRL | 2019 | 703.949768 |
| 77 | World | OWID_WRL | 2020 | 846.229370 |
| 78 | World | OWID_WRL | 2021 | 1032.501221 |
79 rows × 4 columns
# Query_37
engine.execute('''DELETE FROM solar WHERE `Entity` IN ('Kosovo', 'USSR', 'World');''')
# Here we are deleting all the rows for Kosovo, USSR and World for the reasons as stated above.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bc65210>
# Query_38
# The query given below shall help us give distinct Entites and the sum of electricity they have generated from solar power.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Entity`), SUM(`Electricity from solar (TWh)`) AS Total_Solar_Energy
FROM solar
GROUP BY `Entity`
HAVING Total_Solar_Energy <> 0
ORDER BY Total_Solar_Energy DESC;''', engine)
display(query_table)
| Entity | Total_Solar_Energy | |
|---|---|---|
| 0 | China | 1252.639076 |
| 1 | United States | 749.182240 |
| 2 | Japan | 492.576158 |
| 3 | Germany | 440.263997 |
| 4 | India | 260.949518 |
| ... | ... | ... |
| 149 | Democratic Republic of Congo | 0.040000 |
| 150 | Sierra Leone | 0.040000 |
| 151 | Vanuatu | 0.030000 |
| 152 | Benin | 0.020000 |
| 153 | Latvia | 0.015913 |
154 rows × 2 columns
There are 153 distinct countries! which has generated electricity from eco-friendly Solar Power, which is way higher than nuclear countries (in total of 36 countries) and wind countries (in total of 114) and lower than Hydro (in total of 154 countries).
# Query_39
# The query given below shall tell us the starting year (oldest year) and the latest year (that is available in the dataset) of solar energy production.
query_table = pd.read_sql_query('''SELECT DISTINCT (`Entity`), MIN(`Year`) AS Oldest_year, MAX(`YEAR`) AS Latest_year, MAX(`YEAR`) - MIN(`Year`) AS Duration
FROM solar
WHERE `Electricity from solar (TWh)` <> 0
GROUP BY `Entity`
ORDER BY Duration DESC, SUM(`Electricity from solar (TWh)`) DESC;''', engine)
display(query_table)
| Entity | Oldest_year | Latest_year | Duration | |
|---|---|---|---|---|
| 0 | United States | 1983 | 2021 | 38 |
| 1 | United Kingdom | 1984 | 2021 | 37 |
| 2 | Italy | 1989 | 2021 | 32 |
| 3 | Spain | 1989 | 2021 | 32 |
| 4 | Portugal | 1989 | 2021 | 32 |
| ... | ... | ... | ... | ... |
| 149 | Vanuatu | 2018 | 2020 | 2 |
| 150 | Zambia | 2019 | 2020 | 1 |
| 151 | Mozambique | 2019 | 2020 | 1 |
| 152 | Albania | 2019 | 2020 | 1 |
| 153 | Benin | 2019 | 2020 | 1 |
154 rows × 4 columns
# Query_40
# We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.
engine.execute('''ALTER TABLE solar RENAME COLUMN Entity TO Country;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bc661d0>
# Query_41
# Checking whether the column name has been changed or not.
query_table = pd.read_sql_query('''DESCRIBE solar;''', engine)
display(query_table)
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 0 | Country | text | YES | None | ||
| 1 | Code | text | YES | None | ||
| 2 | Year | bigint(20) | YES | None | ||
| 3 | Electricity from solar (TWh) | double | YES | None |
Skip to section 8, but do look at Inference in order to see different results specific to Coal Dataset.
# Reading electricity-coal.csv file
df_coal = pd.read_csv("electricity-coal.csv")
#display(df_coal)
# Writing the Coal dataframe into a table
# Reference : - https://www.w3resource.com/pandas/dataframe/dataframe-to_sql.php
df_coal.to_sql("coal", engine, if_exists = 'replace', index=False, method='multi') # Pushes all data to MySql
# Demonstartation of the fact that import has been successful.
df_coal_table = pd.read_sql_table("coal", engine) # Reading the data.
df_coal_table.head(10)
| Entity | Code | Year | Electricity from coal (TWh) | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 2000 | 0.0 |
| 1 | Afghanistan | AFG | 2001 | 0.0 |
| 2 | Afghanistan | AFG | 2002 | 0.0 |
| 3 | Afghanistan | AFG | 2003 | 0.0 |
| 4 | Afghanistan | AFG | 2004 | 0.0 |
| 5 | Afghanistan | AFG | 2005 | 0.0 |
| 6 | Afghanistan | AFG | 2006 | 0.0 |
| 7 | Afghanistan | AFG | 2007 | 0.0 |
| 8 | Afghanistan | AFG | 2008 | 0.0 |
| 9 | Afghanistan | AFG | 2009 | 0.0 |
# Query_42
# Checking those records where 'Code' is NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT `Entity` FROM coal
WHERE `Code` IS NULL;''', engine)
display(query_table)
| Entity | |
|---|---|
| 0 | Africa |
| 1 | Africa (BP) |
| 2 | Asia |
| 3 | Asia Pacific (BP) |
| 4 | CIS (BP) |
| 5 | Europe |
| 6 | Europe (BP) |
| 7 | European Union (27) |
| 8 | G20 (Ember) |
| 9 | G7 (Ember) |
| 10 | High-income countries |
| 11 | Latin America and Caribbean (Ember) |
| 12 | Low-income countries |
| 13 | Lower-middle-income countries |
| 14 | Middle East (BP) |
| 15 | Middle East (Ember) |
| 16 | Non-OECD (BP) |
| 17 | North America |
| 18 | North America (BP) |
| 19 | OECD (BP) |
| 20 | OECD (Ember) |
| 21 | Oceania |
| 22 | Other Africa (BP) |
| 23 | Other Asia Pacific (BP) |
| 24 | Other CIS (BP) |
| 25 | Other Europe (BP) |
| 26 | Other Middle East (BP) |
| 27 | Other Northern Africa (BP) |
| 28 | Other South America (BP) |
| 29 | Other South and Central America (BP) |
| 30 | Other Southern Africa (BP) |
| 31 | South America |
| 32 | South and Central America (BP) |
| 33 | Upper-middle-income countries |
# Query_43
engine.execute('''DELETE FROM coal WHERE `CODE` IS NULL;''')
# Here we are deleting all the rows whose Entity was a Region and not a Country.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff428044990>
# Query_44
# We shall now confirm that those records have been deleted where Entity IS NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Code`), `Entity` FROM coal;''', engine)
display(query_table)
| Code | Entity | |
|---|---|---|
| 0 | AFG | Afghanistan |
| 1 | ALB | Albania |
| 2 | DZA | Algeria |
| 3 | ASM | American Samoa |
| 4 | AGO | Angola |
| ... | ... | ... |
| 208 | ESH | Western Sahara |
| 209 | OWID_WRL | World |
| 210 | YEM | Yemen |
| 211 | ZMB | Zambia |
| 212 | ZWE | Zimbabwe |
213 rows × 2 columns
# Query_45
query_table = pd.read_sql_query('''SELECT `Entity`, `Code`, CHAR_LENGTH(`CODE`) AS Length
FROM coal
GROUP BY `Entity`
HAVING CHAR_LENGTH(`CODE`) <> 3;''', engine)
# In this SQL query we are looking out for all records where Character length of 'Code' column is not equal to 3.
display(query_table)
| Entity | Code | Length | |
|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 8 |
| 1 | World | OWID_WRL | 8 |
# Query_46
# Seeing those records in its entirety where Code length NOT EQUAL to 3.
query_table = pd.read_sql_query('''SELECT *
FROM coal
WHERE `Entity` IN ('Kosovo', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity from coal (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 2.890000 |
| 1 | Kosovo | OWID_KOS | 2001 | 3.660000 |
| 2 | Kosovo | OWID_KOS | 2002 | 3.640000 |
| 3 | Kosovo | OWID_KOS | 2003 | 3.550000 |
| 4 | Kosovo | OWID_KOS | 2004 | 3.940000 |
| 5 | Kosovo | OWID_KOS | 2005 | 4.320000 |
| 6 | Kosovo | OWID_KOS | 2006 | 4.300000 |
| 7 | Kosovo | OWID_KOS | 2007 | 4.710000 |
| 8 | Kosovo | OWID_KOS | 2008 | 5.050000 |
| 9 | Kosovo | OWID_KOS | 2009 | 4.840000 |
| 10 | Kosovo | OWID_KOS | 2010 | 4.990000 |
| 11 | Kosovo | OWID_KOS | 2011 | 5.680000 |
| 12 | Kosovo | OWID_KOS | 2012 | 5.830000 |
| 13 | Kosovo | OWID_KOS | 2013 | 6.370000 |
| 14 | Kosovo | OWID_KOS | 2014 | 5.270000 |
| 15 | Kosovo | OWID_KOS | 2015 | 5.960000 |
| 16 | Kosovo | OWID_KOS | 2016 | 5.720000 |
| 17 | Kosovo | OWID_KOS | 2017 | 5.730000 |
| 18 | Kosovo | OWID_KOS | 2018 | 5.600000 |
| 19 | Kosovo | OWID_KOS | 2019 | 6.020000 |
| 20 | Kosovo | OWID_KOS | 2020 | 6.330000 |
| 21 | World | OWID_WRL | 1985 | 3748.311279 |
| 22 | World | OWID_WRL | 1986 | 3838.927002 |
| 23 | World | OWID_WRL | 1987 | 4057.907959 |
| 24 | World | OWID_WRL | 1988 | 4200.404785 |
| 25 | World | OWID_WRL | 1989 | 4376.661133 |
| 26 | World | OWID_WRL | 1990 | 4459.965332 |
| 27 | World | OWID_WRL | 1991 | 4556.455078 |
| 28 | World | OWID_WRL | 1992 | 4648.500000 |
| 29 | World | OWID_WRL | 1993 | 4724.649902 |
| 30 | World | OWID_WRL | 1994 | 4890.302246 |
| 31 | World | OWID_WRL | 1995 | 5037.376465 |
| 32 | World | OWID_WRL | 1996 | 5277.954102 |
| 33 | World | OWID_WRL | 1997 | 5393.560547 |
| 34 | World | OWID_WRL | 1998 | 5504.109375 |
| 35 | World | OWID_WRL | 1999 | 5622.301270 |
| 36 | World | OWID_WRL | 2000 | 5986.105469 |
| 37 | World | OWID_WRL | 2001 | 6066.335938 |
| 38 | World | OWID_WRL | 2002 | 6316.846680 |
| 39 | World | OWID_WRL | 2003 | 6762.136719 |
| 40 | World | OWID_WRL | 2004 | 6985.804199 |
| 41 | World | OWID_WRL | 2005 | 7357.744629 |
| 42 | World | OWID_WRL | 2006 | 7759.789062 |
| 43 | World | OWID_WRL | 2007 | 8250.479492 |
| 44 | World | OWID_WRL | 2008 | 8265.874023 |
| 45 | World | OWID_WRL | 2009 | 8115.216797 |
| 46 | World | OWID_WRL | 2010 | 8634.277344 |
| 47 | World | OWID_WRL | 2011 | 9075.767578 |
| 48 | World | OWID_WRL | 2012 | 9107.228516 |
| 49 | World | OWID_WRL | 2013 | 9576.637695 |
| 50 | World | OWID_WRL | 2014 | 9752.349609 |
| 51 | World | OWID_WRL | 2015 | 9406.545898 |
| 52 | World | OWID_WRL | 2016 | 9423.474609 |
| 53 | World | OWID_WRL | 2017 | 9716.676758 |
| 54 | World | OWID_WRL | 2018 | 10097.677734 |
| 55 | World | OWID_WRL | 2019 | 9863.102539 |
| 56 | World | OWID_WRL | 2020 | 9439.265625 |
| 57 | World | OWID_WRL | 2021 | 10243.970703 |
# Query_47
engine.execute('''DELETE FROM coal WHERE `Entity` IN ('Kosovo', 'World');''')
# Here we are deleting all the rows for Kosovo and World for the reasons as stated above.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff42803e4d0>
# Query_48
# The query given below shall help us give distinct Entites and the sum of electricity they have generated from coal power.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Entity`), SUM(`Electricity from coal (TWh)`) AS Total_Coal_Energy
FROM coal
GROUP BY `Entity`
HAVING Total_Coal_Energy <> 0
ORDER BY Total_Coal_Energy DESC;''', engine)
display(query_table)
| Entity | Total_Coal_Energy | |
|---|---|---|
| 0 | China | 79501.712036 |
| 1 | United States | 64988.924500 |
| 2 | India | 20287.559601 |
| 3 | Germany | 10224.626053 |
| 4 | Japan | 8894.512680 |
| ... | ... | ... |
| 81 | Syria | 1.480000 |
| 82 | Norway | 1.360000 |
| 83 | Namibia | 0.600000 |
| 84 | Georgia | 0.050000 |
| 85 | Estonia | 0.040000 |
86 rows × 2 columns
There are 86 distinct countries! which has generated electricity from Dirty-fuel source Coal Power.
Till now we have observed the following: -
# Query_49
# The query given below shall tell us the starting year (oldest year) and the latest year (that is available in the dataset) of coal energy production.
query_table = pd.read_sql_query('''SELECT DISTINCT (`Entity`), MIN(`Year`) AS Oldest_year, MAX(`YEAR`) AS Latest_year, MAX(`YEAR`) - MIN(`Year`) AS Duration
FROM coal
WHERE `Electricity from coal (TWh)` <> 0
GROUP BY `Entity`
ORDER BY Duration DESC, SUM(`Electricity from coal (TWh)`) DESC;''', engine)
display(query_table)
| Entity | Oldest_year | Latest_year | Duration | |
|---|---|---|---|---|
| 0 | China | 1985 | 2021 | 36 |
| 1 | United States | 1985 | 2021 | 36 |
| 2 | India | 1985 | 2021 | 36 |
| 3 | Germany | 1985 | 2021 | 36 |
| 4 | Japan | 1985 | 2021 | 36 |
| ... | ... | ... | ... | ... |
| 81 | Malta | 1990 | 1995 | 5 |
| 82 | Honduras | 2015 | 2020 | 5 |
| 83 | Estonia | 2013 | 2017 | 4 |
| 84 | Georgia | 2016 | 2018 | 2 |
| 85 | Brunei | 2019 | 2020 | 1 |
86 rows × 4 columns
# Query_50
# We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.
engine.execute('''ALTER TABLE coal RENAME COLUMN Entity TO Country;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff428044310>
# Query_51
# Checking whether the column name has been changed or not.
query_table = pd.read_sql_query('''DESCRIBE coal;''', engine)
display(query_table)
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 0 | Country | text | YES | None | ||
| 1 | Code | text | YES | None | ||
| 2 | Year | bigint(20) | YES | None | ||
| 3 | Electricity from coal (TWh) | double | YES | None |
Skip to section 8, but do look at Inference in order to see different results specific to Natural Gas Dataset.
# Reading electricity-gas.csv file
df_gas = pd.read_csv("electricity-gas.csv")
#display(df_gas)
# Writing the Gas dataframe into a table
# Reference : - https://www.w3resource.com/pandas/dataframe/dataframe-to_sql.php
df_gas.to_sql("gas", engine, if_exists = 'replace', index=False, method='multi') # Pushes all data to MySql
# Demonstartation of the fact that import has been successful.
df_gas_table = pd.read_sql_table("gas", engine) # Reading the data.
df_gas_table.head(10)
| Entity | Code | Year | Electricity from gas (TWh) | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 2000 | 0.0 |
| 1 | Afghanistan | AFG | 2001 | 0.0 |
| 2 | Afghanistan | AFG | 2002 | 0.0 |
| 3 | Afghanistan | AFG | 2003 | 0.0 |
| 4 | Afghanistan | AFG | 2004 | 0.0 |
| 5 | Afghanistan | AFG | 2005 | 0.0 |
| 6 | Afghanistan | AFG | 2006 | 0.0 |
| 7 | Afghanistan | AFG | 2007 | 0.0 |
| 8 | Afghanistan | AFG | 2008 | 0.0 |
| 9 | Afghanistan | AFG | 2009 | 0.0 |
# Query_52
# Checking those records where 'Code' is NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT `Entity` FROM gas
WHERE `Code` IS NULL;''', engine)
display(query_table)
| Entity | |
|---|---|
| 0 | Africa |
| 1 | Africa (BP) |
| 2 | Asia |
| 3 | Asia Pacific (BP) |
| 4 | CIS (BP) |
| 5 | Europe |
| 6 | Europe (BP) |
| 7 | European Union (27) |
| 8 | G20 (Ember) |
| 9 | G7 (Ember) |
| 10 | High-income countries |
| 11 | Latin America and Caribbean (Ember) |
| 12 | Low-income countries |
| 13 | Lower-middle-income countries |
| 14 | Middle East (BP) |
| 15 | Middle East (Ember) |
| 16 | Non-OECD (BP) |
| 17 | North America |
| 18 | North America (BP) |
| 19 | OECD (BP) |
| 20 | OECD (Ember) |
| 21 | Oceania |
| 22 | Other Africa (BP) |
| 23 | Other Asia Pacific (BP) |
| 24 | Other CIS (BP) |
| 25 | Other Europe (BP) |
| 26 | Other Middle East (BP) |
| 27 | Other Northern Africa (BP) |
| 28 | Other South America (BP) |
| 29 | Other South and Central America (BP) |
| 30 | Other Southern Africa (BP) |
| 31 | South America |
| 32 | South and Central America (BP) |
| 33 | Upper-middle-income countries |
# Query_53
engine.execute('''DELETE FROM gas WHERE `CODE` IS NULL;''')
# Here we are deleting all the rows whose Entity was a Region and not a Country.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff428111c50>
# Query_54
# We shall now confirm that those records have been deleted where Entity IS NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Code`), `Entity` FROM gas;''', engine)
display(query_table)
| Code | Entity | |
|---|---|---|
| 0 | AFG | Afghanistan |
| 1 | ALB | Albania |
| 2 | DZA | Algeria |
| 3 | ASM | American Samoa |
| 4 | AGO | Angola |
| ... | ... | ... |
| 208 | ESH | Western Sahara |
| 209 | OWID_WRL | World |
| 210 | YEM | Yemen |
| 211 | ZMB | Zambia |
| 212 | ZWE | Zimbabwe |
213 rows × 2 columns
# Query_55
query_table = pd.read_sql_query('''SELECT `Entity`, `Code`, CHAR_LENGTH(`CODE`) AS Length
FROM gas
GROUP BY `Entity`
HAVING CHAR_LENGTH(`CODE`) <> 3;''', engine)
# In this SQL query we are looking out for all records where Character length of 'Code' column is not equal to 3.
display(query_table)
| Entity | Code | Length | |
|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 8 |
| 1 | World | OWID_WRL | 8 |
# Query_56
# Seeing those records in its entirety where Code length NOT EQUAL to 3.
query_table = pd.read_sql_query('''SELECT *
FROM gas
WHERE `Entity` IN ('Kosovo', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity from gas (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 0.000000 |
| 1 | Kosovo | OWID_KOS | 2001 | 0.000000 |
| 2 | Kosovo | OWID_KOS | 2002 | 0.000000 |
| 3 | Kosovo | OWID_KOS | 2003 | 0.000000 |
| 4 | Kosovo | OWID_KOS | 2004 | 0.000000 |
| 5 | Kosovo | OWID_KOS | 2005 | 0.000000 |
| 6 | Kosovo | OWID_KOS | 2006 | 0.000000 |
| 7 | Kosovo | OWID_KOS | 2007 | 0.000000 |
| 8 | Kosovo | OWID_KOS | 2008 | 0.000000 |
| 9 | Kosovo | OWID_KOS | 2009 | 0.000000 |
| 10 | Kosovo | OWID_KOS | 2010 | 0.000000 |
| 11 | Kosovo | OWID_KOS | 2011 | 0.000000 |
| 12 | Kosovo | OWID_KOS | 2012 | 0.000000 |
| 13 | Kosovo | OWID_KOS | 2013 | 0.000000 |
| 14 | Kosovo | OWID_KOS | 2014 | 0.000000 |
| 15 | Kosovo | OWID_KOS | 2015 | 0.000000 |
| 16 | Kosovo | OWID_KOS | 2016 | 0.000000 |
| 17 | Kosovo | OWID_KOS | 2017 | 0.000000 |
| 18 | Kosovo | OWID_KOS | 2018 | 0.000000 |
| 19 | Kosovo | OWID_KOS | 2019 | 0.000000 |
| 20 | Kosovo | OWID_KOS | 2020 | 0.000000 |
| 21 | World | OWID_WRL | 1985 | 1426.303955 |
| 22 | World | OWID_WRL | 1986 | 1432.666138 |
| 23 | World | OWID_WRL | 1987 | 1516.490234 |
| 24 | World | OWID_WRL | 1988 | 1540.937866 |
| 25 | World | OWID_WRL | 1989 | 1728.516479 |
| 26 | World | OWID_WRL | 1990 | 1789.731567 |
| 27 | World | OWID_WRL | 1991 | 1815.254883 |
| 28 | World | OWID_WRL | 1992 | 1829.405884 |
| 29 | World | OWID_WRL | 1993 | 1863.836914 |
| 30 | World | OWID_WRL | 1994 | 1925.131226 |
| 31 | World | OWID_WRL | 1995 | 2036.958984 |
| 32 | World | OWID_WRL | 1996 | 2100.884521 |
| 33 | World | OWID_WRL | 1997 | 2271.176514 |
| 34 | World | OWID_WRL | 1998 | 2408.302979 |
| 35 | World | OWID_WRL | 1999 | 2599.874023 |
| 36 | World | OWID_WRL | 2000 | 2772.827881 |
| 37 | World | OWID_WRL | 2001 | 2950.852295 |
| 38 | World | OWID_WRL | 2002 | 3155.671631 |
| 39 | World | OWID_WRL | 2003 | 3308.574219 |
| 40 | World | OWID_WRL | 2004 | 3563.925049 |
| 41 | World | OWID_WRL | 2005 | 3762.713623 |
| 42 | World | OWID_WRL | 2006 | 3977.849365 |
| 43 | World | OWID_WRL | 2007 | 4302.881836 |
| 44 | World | OWID_WRL | 2008 | 4447.398926 |
| 45 | World | OWID_WRL | 2009 | 4468.451172 |
| 46 | World | OWID_WRL | 2010 | 4888.422363 |
| 47 | World | OWID_WRL | 2011 | 4949.531738 |
| 48 | World | OWID_WRL | 2012 | 5244.609375 |
| 49 | World | OWID_WRL | 2013 | 5127.771973 |
| 50 | World | OWID_WRL | 2014 | 5278.891113 |
| 51 | World | OWID_WRL | 2015 | 5622.241211 |
| 52 | World | OWID_WRL | 2016 | 5858.449707 |
| 53 | World | OWID_WRL | 2017 | 5957.887207 |
| 54 | World | OWID_WRL | 2018 | 6206.647949 |
| 55 | World | OWID_WRL | 2019 | 6421.006348 |
| 56 | World | OWID_WRL | 2020 | 6371.650391 |
| 57 | World | OWID_WRL | 2021 | 6518.495605 |
# Query_57
engine.execute('''DELETE FROM gas WHERE `Entity` IN ('Kosovo', 'World');''')
# Here we are deleting all the rows for Kosovo and World for the reasons as stated above.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff428111150>
# Query_58
# The query given below shall help us give distinct Entites and the sum of electricity they have generated from gas power.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Entity`), SUM(`Electricity from gas (TWh)`) AS Total_Gas_Energy
FROM gas
GROUP BY `Entity`
HAVING Total_Gas_Energy <> 0
ORDER BY Total_Gas_Energy DESC;''', engine)
display(query_table)
| Entity | Total_Gas_Energy | |
|---|---|---|
| 0 | United States | 31484.557709 |
| 1 | Russia | 16958.225281 |
| 2 | Japan | 10093.155701 |
| 3 | Iran | 4303.379135 |
| 4 | United Kingdom | 3697.165621 |
| ... | ... | ... |
| 104 | El Salvador | 5.190000 |
| 105 | Palestine | 2.910000 |
| 106 | Tajikistan | 2.600000 |
| 107 | Panama | 1.820000 |
| 108 | Bosnia and Herzegovina | 0.190000 |
109 rows × 2 columns
There are 109 distinct countries! which has generated electricity from Dirty-fuel source Gas Power.
Till now we have observed the following: -
Countries which utilises solar: - 153
Countries which utilises Coal: - 86
# Query_59
# The query given below shall tell us the starting year (oldest year) and the latest year (that is available in the dataset) of gas energy production.
query_table = pd.read_sql_query('''SELECT DISTINCT (`Entity`), MIN(`Year`) AS Oldest_year, MAX(`YEAR`) AS Latest_year, MAX(`YEAR`) - MIN(`Year`) AS Duration
FROM gas
WHERE `Electricity from gas (TWh)` <> 0
GROUP BY `Entity`
ORDER BY Duration DESC, SUM(`Electricity from gas (TWh)`) DESC;''', engine)
display(query_table)
| Entity | Oldest_year | Latest_year | Duration | |
|---|---|---|---|---|
| 0 | United States | 1985 | 2021 | 36 |
| 1 | Russia | 1985 | 2021 | 36 |
| 2 | Japan | 1985 | 2021 | 36 |
| 3 | Iran | 1985 | 2021 | 36 |
| 4 | United Kingdom | 1985 | 2021 | 36 |
| ... | ... | ... | ... | ... |
| 104 | Malta | 2017 | 2021 | 4 |
| 105 | Angola | 2017 | 2020 | 3 |
| 106 | Mozambique | 2018 | 2020 | 2 |
| 107 | Tanzania | 2018 | 2020 | 2 |
| 108 | Panama | 2018 | 2020 | 2 |
109 rows × 4 columns
# Query_60
# We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.
engine.execute('''ALTER TABLE gas RENAME COLUMN Entity TO Country;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff438f1bcd0>
# Query_61
# Checking whether the column name has been changed or not.
query_table = pd.read_sql_query('''DESCRIBE gas;''', engine)
display(query_table)
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 0 | Country | text | YES | None | ||
| 1 | Code | text | YES | None | ||
| 2 | Year | bigint(20) | YES | None | ||
| 3 | Electricity from gas (TWh) | double | YES | None |
Skip to section 8, but do look at Inference in order to see different results specific to Oil Dataset.
# Reading electricity-oil.csv file
df_oil = pd.read_csv("electricity-oil.csv")
#display(df_oil)
# Writing the Oil dataframe into a table
# Reference : - https://www.w3resource.com/pandas/dataframe/dataframe-to_sql.php
df_oil.to_sql("oil", engine, if_exists = 'replace', index=False, method='multi') # Pushes all data to MySql
# Demonstartation of the fact that import has been successful.
df_oil_table = pd.read_sql_table("oil", engine) # Reading the data.
df_oil_table.head(10)
| Entity | Code | Year | Electricity from oil (TWh) | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 2000 | 0.16 |
| 1 | Afghanistan | AFG | 2001 | 0.09 |
| 2 | Afghanistan | AFG | 2002 | 0.13 |
| 3 | Afghanistan | AFG | 2003 | 0.31 |
| 4 | Afghanistan | AFG | 2004 | 0.33 |
| 5 | Afghanistan | AFG | 2005 | 0.34 |
| 6 | Afghanistan | AFG | 2006 | 0.20 |
| 7 | Afghanistan | AFG | 2007 | 0.20 |
| 8 | Afghanistan | AFG | 2008 | 0.19 |
| 9 | Afghanistan | AFG | 2009 | 0.16 |
# Query_62
# Checking those records where 'Code' is NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT `Entity` FROM oil
WHERE `Code` IS NULL;''', engine)
display(query_table)
| Entity | |
|---|---|
| 0 | Africa |
| 1 | Africa (BP) |
| 2 | Asia |
| 3 | Asia Pacific (BP) |
| 4 | CIS (BP) |
| 5 | Europe |
| 6 | Europe (BP) |
| 7 | European Union (27) |
| 8 | G20 (Ember) |
| 9 | G7 (Ember) |
| 10 | High-income countries |
| 11 | Latin America and Caribbean (Ember) |
| 12 | Low-income countries |
| 13 | Lower-middle-income countries |
| 14 | Middle East (BP) |
| 15 | Middle East (Ember) |
| 16 | Non-OECD (BP) |
| 17 | North America |
| 18 | North America (BP) |
| 19 | OECD (BP) |
| 20 | OECD (Ember) |
| 21 | Oceania |
| 22 | Other Africa (BP) |
| 23 | Other Asia Pacific (BP) |
| 24 | Other CIS (BP) |
| 25 | Other Europe (BP) |
| 26 | Other Middle East (BP) |
| 27 | Other Northern Africa (BP) |
| 28 | Other South America (BP) |
| 29 | Other South and Central America (BP) |
| 30 | Other Southern Africa (BP) |
| 31 | South America |
| 32 | South and Central America (BP) |
| 33 | Upper-middle-income countries |
# Query_63
engine.execute('''DELETE FROM oil WHERE `CODE` IS NULL;''')
# Here we are deleting all the rows whose Entity was a Region and not a Country.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff428111b50>
# Query_64
# We shall now confirm that those records have been deleted where Entity IS NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Code`), `Entity` FROM oil;''', engine)
display(query_table)
| Code | Entity | |
|---|---|---|
| 0 | AFG | Afghanistan |
| 1 | ALB | Albania |
| 2 | DZA | Algeria |
| 3 | ASM | American Samoa |
| 4 | AGO | Angola |
| ... | ... | ... |
| 208 | ESH | Western Sahara |
| 209 | OWID_WRL | World |
| 210 | YEM | Yemen |
| 211 | ZMB | Zambia |
| 212 | ZWE | Zimbabwe |
213 rows × 2 columns
# Query_65
query_table = pd.read_sql_query('''SELECT `Entity`, `Code`, CHAR_LENGTH(`CODE`) AS Length
FROM oil
GROUP BY `Entity`
HAVING CHAR_LENGTH(`CODE`) <> 3;''', engine)
# In this SQL query we are looking out for all records where Character length of 'Code' column is not equal to 3.
display(query_table)
| Entity | Code | Length | |
|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 8 |
| 1 | World | OWID_WRL | 8 |
# Query_66
# Seeing those records in its entirety where Code length NOT EQUAL to 3.
query_table = pd.read_sql_query('''SELECT *
FROM oil
WHERE `Entity` IN ('Kosovo', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity from oil (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 0.020000 |
| 1 | Kosovo | OWID_KOS | 2001 | 0.020000 |
| 2 | Kosovo | OWID_KOS | 2002 | 0.020000 |
| 3 | Kosovo | OWID_KOS | 2003 | 0.030000 |
| 4 | Kosovo | OWID_KOS | 2004 | 0.030000 |
| 5 | Kosovo | OWID_KOS | 2005 | 0.030000 |
| 6 | Kosovo | OWID_KOS | 2006 | 0.030000 |
| 7 | Kosovo | OWID_KOS | 2007 | 0.030000 |
| 8 | Kosovo | OWID_KOS | 2008 | 0.030000 |
| 9 | Kosovo | OWID_KOS | 2009 | 0.010000 |
| 10 | Kosovo | OWID_KOS | 2010 | 0.020000 |
| 11 | Kosovo | OWID_KOS | 2011 | 0.020000 |
| 12 | Kosovo | OWID_KOS | 2012 | 0.010000 |
| 13 | Kosovo | OWID_KOS | 2013 | 0.010000 |
| 14 | Kosovo | OWID_KOS | 2014 | 0.010000 |
| 15 | Kosovo | OWID_KOS | 2015 | 0.010000 |
| 16 | Kosovo | OWID_KOS | 2016 | 0.010000 |
| 17 | Kosovo | OWID_KOS | 2017 | 0.010000 |
| 18 | Kosovo | OWID_KOS | 2018 | 0.010000 |
| 19 | Kosovo | OWID_KOS | 2019 | 0.020000 |
| 20 | Kosovo | OWID_KOS | 2020 | 0.010000 |
| 21 | World | OWID_WRL | 1985 | 1108.860596 |
| 22 | World | OWID_WRL | 1986 | 1166.325562 |
| 23 | World | OWID_WRL | 1987 | 1181.093994 |
| 24 | World | OWID_WRL | 1988 | 1254.357300 |
| 25 | World | OWID_WRL | 1989 | 1346.962891 |
| 26 | World | OWID_WRL | 1990 | 1364.746948 |
| 27 | World | OWID_WRL | 1991 | 1350.083008 |
| 28 | World | OWID_WRL | 1992 | 1328.291504 |
| 29 | World | OWID_WRL | 1993 | 1266.685791 |
| 30 | World | OWID_WRL | 1994 | 1302.172363 |
| 31 | World | OWID_WRL | 1995 | 1259.934448 |
| 32 | World | OWID_WRL | 1996 | 1246.048218 |
| 33 | World | OWID_WRL | 1997 | 1244.686401 |
| 34 | World | OWID_WRL | 1998 | 1295.091797 |
| 35 | World | OWID_WRL | 1999 | 1267.382690 |
| 36 | World | OWID_WRL | 2000 | 1242.589355 |
| 37 | World | OWID_WRL | 2001 | 1207.649292 |
| 38 | World | OWID_WRL | 2002 | 1188.472656 |
| 39 | World | OWID_WRL | 2003 | 1193.843994 |
| 40 | World | OWID_WRL | 2004 | 1169.327515 |
| 41 | World | OWID_WRL | 2005 | 1173.346436 |
| 42 | World | OWID_WRL | 2006 | 1067.735840 |
| 43 | World | OWID_WRL | 2007 | 1086.862427 |
| 44 | World | OWID_WRL | 2008 | 1062.801514 |
| 45 | World | OWID_WRL | 2009 | 994.517090 |
| 46 | World | OWID_WRL | 2010 | 952.510437 |
| 47 | World | OWID_WRL | 2011 | 1040.999634 |
| 48 | World | OWID_WRL | 2012 | 1137.178101 |
| 49 | World | OWID_WRL | 2013 | 1076.336060 |
| 50 | World | OWID_WRL | 2014 | 1027.133057 |
| 51 | World | OWID_WRL | 2015 | 1021.654480 |
| 52 | World | OWID_WRL | 2016 | 949.481262 |
| 53 | World | OWID_WRL | 2017 | 862.205017 |
| 54 | World | OWID_WRL | 2018 | 776.462952 |
| 55 | World | OWID_WRL | 2019 | 700.608704 |
| 56 | World | OWID_WRL | 2020 | 661.658630 |
| 57 | World | OWID_WRL | 2021 | 720.285828 |
# Query_66
# Seeing those records in its entirety where Code length NOT EQUAL to 3.
query_table = pd.read_sql_query('''SELECT *
FROM oil
WHERE `Entity` IN ('Kosovo', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity from oil (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 0.020000 |
| 1 | Kosovo | OWID_KOS | 2001 | 0.020000 |
| 2 | Kosovo | OWID_KOS | 2002 | 0.020000 |
| 3 | Kosovo | OWID_KOS | 2003 | 0.030000 |
| 4 | Kosovo | OWID_KOS | 2004 | 0.030000 |
| 5 | Kosovo | OWID_KOS | 2005 | 0.030000 |
| 6 | Kosovo | OWID_KOS | 2006 | 0.030000 |
| 7 | Kosovo | OWID_KOS | 2007 | 0.030000 |
| 8 | Kosovo | OWID_KOS | 2008 | 0.030000 |
| 9 | Kosovo | OWID_KOS | 2009 | 0.010000 |
| 10 | Kosovo | OWID_KOS | 2010 | 0.020000 |
| 11 | Kosovo | OWID_KOS | 2011 | 0.020000 |
| 12 | Kosovo | OWID_KOS | 2012 | 0.010000 |
| 13 | Kosovo | OWID_KOS | 2013 | 0.010000 |
| 14 | Kosovo | OWID_KOS | 2014 | 0.010000 |
| 15 | Kosovo | OWID_KOS | 2015 | 0.010000 |
| 16 | Kosovo | OWID_KOS | 2016 | 0.010000 |
| 17 | Kosovo | OWID_KOS | 2017 | 0.010000 |
| 18 | Kosovo | OWID_KOS | 2018 | 0.010000 |
| 19 | Kosovo | OWID_KOS | 2019 | 0.020000 |
| 20 | Kosovo | OWID_KOS | 2020 | 0.010000 |
| 21 | World | OWID_WRL | 1985 | 1108.860596 |
| 22 | World | OWID_WRL | 1986 | 1166.325562 |
| 23 | World | OWID_WRL | 1987 | 1181.093994 |
| 24 | World | OWID_WRL | 1988 | 1254.357300 |
| 25 | World | OWID_WRL | 1989 | 1346.962891 |
| 26 | World | OWID_WRL | 1990 | 1364.746948 |
| 27 | World | OWID_WRL | 1991 | 1350.083008 |
| 28 | World | OWID_WRL | 1992 | 1328.291504 |
| 29 | World | OWID_WRL | 1993 | 1266.685791 |
| 30 | World | OWID_WRL | 1994 | 1302.172363 |
| 31 | World | OWID_WRL | 1995 | 1259.934448 |
| 32 | World | OWID_WRL | 1996 | 1246.048218 |
| 33 | World | OWID_WRL | 1997 | 1244.686401 |
| 34 | World | OWID_WRL | 1998 | 1295.091797 |
| 35 | World | OWID_WRL | 1999 | 1267.382690 |
| 36 | World | OWID_WRL | 2000 | 1242.589355 |
| 37 | World | OWID_WRL | 2001 | 1207.649292 |
| 38 | World | OWID_WRL | 2002 | 1188.472656 |
| 39 | World | OWID_WRL | 2003 | 1193.843994 |
| 40 | World | OWID_WRL | 2004 | 1169.327515 |
| 41 | World | OWID_WRL | 2005 | 1173.346436 |
| 42 | World | OWID_WRL | 2006 | 1067.735840 |
| 43 | World | OWID_WRL | 2007 | 1086.862427 |
| 44 | World | OWID_WRL | 2008 | 1062.801514 |
| 45 | World | OWID_WRL | 2009 | 994.517090 |
| 46 | World | OWID_WRL | 2010 | 952.510437 |
| 47 | World | OWID_WRL | 2011 | 1040.999634 |
| 48 | World | OWID_WRL | 2012 | 1137.178101 |
| 49 | World | OWID_WRL | 2013 | 1076.336060 |
| 50 | World | OWID_WRL | 2014 | 1027.133057 |
| 51 | World | OWID_WRL | 2015 | 1021.654480 |
| 52 | World | OWID_WRL | 2016 | 949.481262 |
| 53 | World | OWID_WRL | 2017 | 862.205017 |
| 54 | World | OWID_WRL | 2018 | 776.462952 |
| 55 | World | OWID_WRL | 2019 | 700.608704 |
| 56 | World | OWID_WRL | 2020 | 661.658630 |
| 57 | World | OWID_WRL | 2021 | 720.285828 |
# Query_67
engine.execute('''DELETE FROM oil WHERE `Entity` IN ('Kosovo', 'World');''')
# Here we are deleting all the rows for Kosovo and World for the reasons as stated above.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff438f1b650>
# Query_68
# The query given below shall help us give distinct Entites and the sum of electricity they have generated from oil power.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Entity`), SUM(`Electricity from oil (TWh)`) AS Total_Oil_Energy
FROM oil
GROUP BY `Entity`
HAVING Total_Oil_Energy <> 0
ORDER BY Total_Oil_Energy DESC;''', engine)
display(query_table)
| Entity | Total_Oil_Energy | |
|---|---|---|
| 0 | Japan | 6274.642496 |
| 1 | Saudi Arabia | 3389.124531 |
| 2 | United States | 3160.069517 |
| 3 | Italy | 2278.027412 |
| 4 | Mexico | 2234.671963 |
| ... | ... | ... |
| 187 | Moldova | 0.160000 |
| 188 | United Arab Emirates | 0.096014 |
| 189 | Nigeria | 0.090000 |
| 190 | Iceland | 0.060000 |
| 191 | Paraguay | 0.050000 |
192 rows × 2 columns
There are 192 distinct countries! which has generated electricity from Dirty-fuel source Oil Power.
Till now we have observed the following: -
Countries which utilises solar: - 153
Countries which utilises Coal: - 86
# Query_69
# The query given below shall tell us the starting year (oldest year) and the latest year (that is available in the dataset) of oil energy production.
query_table = pd.read_sql_query('''SELECT DISTINCT (`Entity`), MIN(`Year`) AS Oldest_year, MAX(`YEAR`) AS Latest_year, MAX(`YEAR`) - MIN(`Year`) AS Duration
FROM oil
WHERE `Electricity from oil (TWh)` <> 0
GROUP BY `Entity`
ORDER BY Duration DESC, SUM(`Electricity from oil (TWh)`) DESC;''', engine)
display(query_table)
| Entity | Oldest_year | Latest_year | Duration | |
|---|---|---|---|---|
| 0 | Japan | 1985 | 2021 | 36 |
| 1 | Saudi Arabia | 1985 | 2021 | 36 |
| 2 | United States | 1985 | 2021 | 36 |
| 3 | Italy | 1985 | 2021 | 36 |
| 4 | Mexico | 1985 | 2021 | 36 |
| ... | ... | ... | ... | ... |
| 187 | Moldova | 2010 | 2018 | 8 |
| 188 | Laos | 2000 | 2003 | 3 |
| 189 | Paraguay | 2000 | 2003 | 3 |
| 190 | Nigeria | 2019 | 2020 | 1 |
| 191 | Timor | 2020 | 2020 | 0 |
192 rows × 4 columns
# Query_70
# We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.
engine.execute('''ALTER TABLE oil RENAME COLUMN Entity TO Country;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff438f3ce90>
# Query_71
# Checking whether the column name has been changed or not.
query_table = pd.read_sql_query('''DESCRIBE oil;''', engine)
display(query_table)
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 0 | Country | text | YES | None | ||
| 1 | Code | text | YES | None | ||
| 2 | Year | bigint(20) | YES | None | ||
| 3 | Electricity from oil (TWh) | double | YES | None |
We shall now be working with a dataset which tells us the total electricity geneated for a paticular country for a paticular year.
The fuel used for electricity generation in this dataset is Fossil Fuels (Oil, Gas, Coal) + Non Fossil Fuels (Nulcear + Hydro + Solar + Wind).
# Reading energy.csv file
df_total_electricity_generated = pd.read_csv("energy.csv")
#display(df_total_electricity_generated)
# Writing the Total Electricity Generated dataframe into a table
# Reference : - https://www.w3resource.com/pandas/dataframe/dataframe-to_sql.php
df_total_electricity_generated.to_sql("total_electricity", engine, if_exists = 'replace', index=False, method='multi') # Pushes all data to MySql
# Demonstartation of the fact that import has been successful.
df_total_electricity_table = pd.read_sql_table("total_electricity", engine) # Reading the data.
df_total_electricity_table.head(10)
| Entity | Code | Year | Electricity generation (TWh) | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 2000 | 0.47 |
| 1 | Afghanistan | AFG | 2001 | 0.59 |
| 2 | Afghanistan | AFG | 2002 | 0.69 |
| 3 | Afghanistan | AFG | 2003 | 0.94 |
| 4 | Afghanistan | AFG | 2004 | 0.89 |
| 5 | Afghanistan | AFG | 2005 | 0.93 |
| 6 | Afghanistan | AFG | 2006 | 0.84 |
| 7 | Afghanistan | AFG | 2007 | 0.95 |
| 8 | Afghanistan | AFG | 2008 | 0.73 |
| 9 | Afghanistan | AFG | 2009 | 0.94 |
The columns of energy.csv are similar to that of nuclear.csv. For their interpretation please look before Query 1.
We have to do data cleaning for energy.csv as well, just like we did for nuclear and hydro.
Here are the following steps which we are going to execute: -
# Query_72
# Checking those records where 'Code' is NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT `Entity` FROM total_electricity
WHERE `Code` IS NULL;''', engine)
display(query_table)
| Entity | |
|---|---|
| 0 | Africa |
| 1 | Africa (BP) |
| 2 | Asia |
| 3 | Asia Pacific (BP) |
| 4 | CIS (BP) |
| 5 | Central America (BP) |
| 6 | Eastern Africa (BP) |
| 7 | Europe |
| 8 | Europe (BP) |
| 9 | European Union (27) |
| 10 | G20 (Ember) |
| 11 | G7 (Ember) |
| 12 | High-income countries |
| 13 | Latin America and Caribbean (Ember) |
| 14 | Low-income countries |
| 15 | Lower-middle-income countries |
| 16 | Middle Africa (BP) |
| 17 | Middle East (BP) |
| 18 | Middle East (Ember) |
| 19 | Non-OECD (BP) |
| 20 | North America |
| 21 | North America (BP) |
| 22 | OECD (BP) |
| 23 | OECD (Ember) |
| 24 | Oceania |
| 25 | Other Africa (BP) |
| 26 | Other Asia Pacific (BP) |
| 27 | Other CIS (BP) |
| 28 | Other Caribbean (BP) |
| 29 | Other Europe (BP) |
| 30 | Other Middle East (BP) |
| 31 | Other Northern Africa (BP) |
| 32 | Other South America (BP) |
| 33 | Other South and Central America (BP) |
| 34 | Other Southern Africa (BP) |
| 35 | South America |
| 36 | South and Central America (BP) |
| 37 | Upper-middle-income countries |
| 38 | Western Africa (BP) |
# Query_73
engine.execute('''DELETE FROM total_electricity WHERE `CODE` IS NULL;''')
# Here we are deleting all the rows whose Entity was a Region and not a Country.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bc8ddd0>
# Query_74
# We shall now confirm that those records have been deleted where Entity IS NULL.
query_table = pd.read_sql_query('''SELECT DISTINCT(`Code`), `Entity` FROM total_electricity;''', engine)
display(query_table)
| Code | Entity | |
|---|---|---|
| 0 | AFG | Afghanistan |
| 1 | ALB | Albania |
| 2 | DZA | Algeria |
| 3 | ASM | American Samoa |
| 4 | AGO | Angola |
| ... | ... | ... |
| 211 | ESH | Western Sahara |
| 212 | OWID_WRL | World |
| 213 | YEM | Yemen |
| 214 | ZMB | Zambia |
| 215 | ZWE | Zimbabwe |
216 rows × 2 columns
From the list above, we can confirm that no NULL value is left in the table.
# Query_75
query_table = pd.read_sql_query('''SELECT `Entity`, `Code`, CHAR_LENGTH(`CODE`) AS Length
FROM total_electricity
GROUP BY `Entity`
HAVING CHAR_LENGTH(`CODE`) <> 3;''', engine)
# In this SQL query we are looking out for all records where Character length of 'Code' column is not equal to 3.
display(query_table)
| Entity | Code | Length | |
|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 8 |
| 1 | World | OWID_WRL | 8 |
The result is not same for the above query as we have seen for nuclear and hydro. Why is this the case? Why is USSR missing? We must investigate. But first we must delete the rows for Kosovo and World. The reason for doing this is given below Query 6.
# Query_76
# Seeing those records in its entirety where Code length NOT EQUAL to 3.
query_table = pd.read_sql_query('''SELECT *
FROM total_electricity
WHERE `Entity` IN ('Kosovo', 'World');''', engine)
display(query_table)
| Entity | Code | Year | Electricity generation (TWh) | |
|---|---|---|---|---|
| 0 | Kosovo | OWID_KOS | 2000 | 2.960 |
| 1 | Kosovo | OWID_KOS | 2001 | 3.730 |
| 2 | Kosovo | OWID_KOS | 2002 | 3.710 |
| 3 | Kosovo | OWID_KOS | 2003 | 3.630 |
| 4 | Kosovo | OWID_KOS | 2004 | 4.080 |
| 5 | Kosovo | OWID_KOS | 2005 | 4.460 |
| 6 | Kosovo | OWID_KOS | 2006 | 4.430 |
| 7 | Kosovo | OWID_KOS | 2007 | 4.830 |
| 8 | Kosovo | OWID_KOS | 2008 | 5.160 |
| 9 | Kosovo | OWID_KOS | 2009 | 4.970 |
| 10 | Kosovo | OWID_KOS | 2010 | 5.170 |
| 11 | Kosovo | OWID_KOS | 2011 | 5.800 |
| 12 | Kosovo | OWID_KOS | 2012 | 5.940 |
| 13 | Kosovo | OWID_KOS | 2013 | 6.520 |
| 14 | Kosovo | OWID_KOS | 2014 | 5.430 |
| 15 | Kosovo | OWID_KOS | 2015 | 6.110 |
| 16 | Kosovo | OWID_KOS | 2016 | 5.980 |
| 17 | Kosovo | OWID_KOS | 2017 | 5.920 |
| 18 | Kosovo | OWID_KOS | 2018 | 5.910 |
| 19 | Kosovo | OWID_KOS | 2019 | 6.350 |
| 20 | Kosovo | OWID_KOS | 2020 | 6.700 |
| 21 | World | OWID_WRL | 1985 | 9883.152 |
| 22 | World | OWID_WRL | 1986 | 10178.033 |
| 23 | World | OWID_WRL | 1987 | 10667.898 |
| 24 | World | OWID_WRL | 1988 | 11137.956 |
| 25 | World | OWID_WRL | 1989 | 11656.966 |
| 26 | World | OWID_WRL | 1990 | 11961.082 |
| 27 | World | OWID_WRL | 1991 | 12222.666 |
| 28 | World | OWID_WRL | 1992 | 12335.899 |
| 29 | World | OWID_WRL | 1993 | 12599.419 |
| 30 | World | OWID_WRL | 1994 | 12923.792 |
| 31 | World | OWID_WRL | 1995 | 13382.480 |
| 32 | World | OWID_WRL | 1996 | 13797.107 |
| 33 | World | OWID_WRL | 1997 | 14128.893 |
| 34 | World | OWID_WRL | 1998 | 14511.115 |
| 35 | World | OWID_WRL | 1999 | 14925.670 |
| 36 | World | OWID_WRL | 2000 | 15564.162 |
| 37 | World | OWID_WRL | 2001 | 15799.781 |
| 38 | World | OWID_WRL | 2002 | 16357.043 |
| 39 | World | OWID_WRL | 2003 | 16935.316 |
| 40 | World | OWID_WRL | 2004 | 17738.584 |
| 41 | World | OWID_WRL | 2005 | 18465.418 |
| 42 | World | OWID_WRL | 2006 | 19167.045 |
| 43 | World | OWID_WRL | 2007 | 20059.064 |
| 44 | World | OWID_WRL | 2008 | 20436.479 |
| 45 | World | OWID_WRL | 2009 | 20278.684 |
| 46 | World | OWID_WRL | 2010 | 21581.271 |
| 47 | World | OWID_WRL | 2011 | 22268.924 |
| 48 | World | OWID_WRL | 2012 | 22817.488 |
| 49 | World | OWID_WRL | 2013 | 23452.438 |
| 50 | World | OWID_WRL | 2014 | 24049.812 |
| 51 | World | OWID_WRL | 2015 | 24292.012 |
| 52 | World | OWID_WRL | 2016 | 24924.240 |
| 53 | World | OWID_WRL | 2017 | 25647.717 |
| 54 | World | OWID_WRL | 2018 | 26677.312 |
| 55 | World | OWID_WRL | 2019 | 27036.617 |
| 56 | World | OWID_WRL | 2020 | 26889.158 |
| 57 | World | OWID_WRL | 2021 | 28466.295 |
# Query_77
engine.execute('''DELETE FROM total_electricity WHERE `Entity` IN ('Kosovo', 'World');''')
# Here we are deleting all the rows for Kosovo and World for the reasons as stated above.
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff43997edd0>
After looking at Query 6 and Query 16, It has been discovered that the values of USSR were from 1965-1984. After that we get the values of Post Soviet countries like Russia, Ukraine etc.
We must investigate the Minimum and Maximum values of Year column for Nuclear, Hydro, other fuel sources dataset and Total_Electricity Dataset.
# Query_78
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Nuclear, MAX(`Year`) AS Latest_Year_Available_Nuclear
FROM nuclear;''', engine)
display(query_table)
| Oldest_Year_Available_Nuclear | Latest_Year_Available_Nuclear | |
|---|---|---|
| 0 | 1965 | 2021 |
# Query_79
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Hydro, MAX(`Year`) AS Latest_Year_Available_Hydro
FROM hydro;''', engine)
display(query_table)
| Oldest_Year_Available_Hydro | Latest_Year_Available_Hydro | |
|---|---|---|
| 0 | 1965 | 2021 |
# Query_80
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Wind, MAX(`Year`) AS Latest_Year_Available_Wind
FROM wind;''', engine)
display(query_table)
| Oldest_Year_Available_Wind | Latest_Year_Available_Wind | |
|---|---|---|
| 0 | 1965 | 2021 |
# Query_81
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Solar, MAX(`Year`) AS Latest_Year_Available_Solar
FROM solar;''', engine)
display(query_table)
| Oldest_Year_Available_Solar | Latest_Year_Available_Solar | |
|---|---|---|
| 0 | 1965 | 2021 |
# Query_82
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Coal, MAX(`Year`) AS Latest_Year_Available_Coal
FROM coal;''', engine)
display(query_table)
| Oldest_Year_Available_Coal | Latest_Year_Available_Coal | |
|---|---|---|
| 0 | 1985 | 2021 |
# Query_83
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Gas, MAX(`Year`) AS Latest_Year_Available_Gas
FROM gas;''', engine)
display(query_table)
| Oldest_Year_Available_Gas | Latest_Year_Available_Gas | |
|---|---|---|
| 0 | 1985 | 2021 |
# Query_84
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Oil, MAX(`Year`) AS Latest_Year_Available_Oil
FROM oil;''', engine)
display(query_table)
| Oldest_Year_Available_Oil | Latest_Year_Available_Oil | |
|---|---|---|
| 0 | 1985 | 2021 |
# Query_85
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Total, MAX(`Year`) AS Latest_Year_Available_Total
FROM total_electricity;''', engine)
display(query_table)
| Oldest_Year_Available_Total | Latest_Year_Available_Total | |
|---|---|---|
| 0 | 1985 | 2021 |
Since the dataset available for Total Electricity is only available from 1985-2021 as opposed to 1965-2021, therefore we must delete the records in Nuclear, Hydro, Wind, Solar dataset for years 1965-1984.
We are doing this because one of our guiding questions is to take out: -
Clean Energy % = (Nuclear + Hydro + Wind + Solar)/ Total Electricity Generated
Since we only have Total Electricity generated from 1985-2021, we shall now only focus on those 36 years.
# Deleting the rows where year is from 1965-1984 in Nuclear, Hydro, Wind and Solar dataset.
# Query_86
engine.execute('''DELETE FROM nuclear WHERE `Year` BETWEEN 1965 AND 1984;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46c04ef90>
# Query_87
engine.execute('''DELETE FROM hydro WHERE `Year` BETWEEN 1965 AND 1984;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff439a18190>
# Query_88
engine.execute('''DELETE FROM wind WHERE `Year` BETWEEN 1965 AND 1984;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46c04e350>
# Query_89
engine.execute('''DELETE FROM solar WHERE `Year` BETWEEN 1965 AND 1984;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff439a0d590>
Now making sure that the year range in both Nuclear and Hydro table is from 1985 - 2021.
# Query_90
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Nuclear, MAX(`Year`) AS Latest_Year_Available_Nuclear
FROM nuclear;''', engine)
display(query_table)
| Oldest_Year_Available_Nuclear | Latest_Year_Available_Nuclear | |
|---|---|---|
| 0 | 1985 | 2021 |
# Query_91
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Hydro, MAX(`Year`) AS Latest_Year_Available_Hydro
FROM hydro;''', engine)
display(query_table)
| Oldest_Year_Available_Hydro | Latest_Year_Available_Hydro | |
|---|---|---|
| 0 | 1985 | 2021 |
# Query_92
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Wind, MAX(`Year`) AS Latest_Year_Available_Wind
FROM wind;''', engine)
display(query_table)
| Oldest_Year_Available_Wind | Latest_Year_Available_Wind | |
|---|---|---|
| 0 | 1985 | 2021 |
# Query_93
query_table = pd.read_sql_query('''SELECT MIN(`Year`) AS Oldest_Year_Available_Solar, MAX(`Year`) AS Latest_Year_Available_Solar
FROM solar;''', engine)
display(query_table)
| Oldest_Year_Available_Solar | Latest_Year_Available_Solar | |
|---|---|---|
| 0 | 1985 | 2021 |
Your Datasets are now more or less synchronised.
Also we also have taken care of the USSR Problem and investigated successfully.
# Query_94
# We shall now change the column name from Entity to Country because now our Entity column only consists of various countries and no regions.
engine.execute('''ALTER TABLE total_electricity RENAME COLUMN Entity TO Country;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff438f72050>
# Query_95
# Checking whether the column name has been changed or not.
query_table = pd.read_sql_query('''DESCRIBE total_electricity;''', engine)
display(query_table)
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 0 | Country | text | YES | None | ||
| 1 | Code | text | YES | None | ||
| 2 | Year | bigint(20) | YES | None | ||
| 3 | Electricity generation (TWh) | double | YES | None |
Now we would investigate the total electricity usage of each and every country with duration and average consumption per year.
# Query_96
query_table = pd.read_sql_query('''SELECT DISTINCT(`Country`),
SUM(`Electricity generation (TWh)`) AS Total_Electricity_Generated,
MAX(`YEAR`) - MIN(`Year`) AS Duration,
(SUM(`Electricity generation (TWh)`)/ (MAX(`YEAR`) - MIN(`Year`))) AS Average_Consumption_Per_Year
FROM total_electricity
GROUP BY `Country`
HAVING Total_Electricity_Generated <> 0
ORDER BY Duration DESC, Total_Electricity_Generated DESC;''', engine)
display(query_table)
| Country | Total_Electricity_Generated | Duration | Average_Consumption_Per_Year | |
|---|---|---|---|---|
| 0 | United States | 144090.67250 | 36 | 4002.518681 |
| 1 | China | 111040.88900 | 36 | 3084.469139 |
| 2 | Japan | 37490.91565 | 36 | 1041.414324 |
| 3 | Russia | 36870.26270 | 36 | 1024.173964 |
| 4 | India | 28417.13628 | 36 | 789.364897 |
| ... | ... | ... | ... | ... |
| 205 | Montserrat | 0.39000 | 20 | 0.019500 |
| 206 | Falkland Islands | 0.35000 | 20 | 0.017500 |
| 207 | Timor | 0.50000 | 17 | 0.029412 |
| 208 | Montenegro | 52.90000 | 16 | 3.306250 |
| 209 | South Sudan | 4.49000 | 8 | 0.561250 |
210 rows × 4 columns
In the above Query, there is a HAVING Condition: -
HAVING Total_Electricity_Generated <> 0.
Let us now look at the countries for whom we have no data for all of their years.
# Query_97
query_table = pd.read_sql_query('''SELECT `Country`, SUM(`Electricity generation (TWh)`) AS Total_Electricity_Generated
FROM total_electricity
GROUP BY `Country`
HAVING Total_Electricity_Generated = 0;''', engine)
display(query_table)
| Country | Total_Electricity_Generated | |
|---|---|---|
| 0 | Bermuda | 0.0 |
| 1 | Gibraltar | 0.0 |
| 2 | Niue | 0.0 |
| 3 | Saint Helena | 0.0 |
Since one of our guiding questions is to take out: -
Clean Energy % = (Nuclear + Hydro + Wind + Solar)/ Total Electricity Generated
we cannot have value of Total_Electricity_Generated column as 0. Since anything divided by 0 is undefined.
Therefore we must delete these countries from ALL of the datasets. i.e. Nuclear, Hydro, Total_Electricity etc.
For deleting these countries from Nuclear, Hydro and Total_Electricity datasets, one can run the following queries: -
Country IN (SELECT total_electricity.Country
FROM total_electricity
GROUP BY total_electricity.`Country`
HAVING SUM(total_electricity.`Electricity generation (TWh)`) = 0);
Country IN (SELECT total_electricity.Country
FROM total_electricity
GROUP BY total_electricity.`Country`
HAVING SUM(total_electricity.`Electricity generation (TWh)`) = 0);
Country IN (SELECT total_electricity.Country
FROM total_electricity
GROUP BY total_electricity.`Country`
HAVING SUM(total_electricity.`Electricity generation (TWh)`) = 0);
However due to MySQL Timeout, we have to game the system and perform direct deletion.
# Query_98
engine.execute('''DELETE FROM nuclear WHERE `Country` REGEXP 'Bermuda|Gibraltar|Niue|Saint Helena';''')
# Reference: - https://stackoverflow.com/a/1127106/10912105
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46c04eb90>
# Query_99
engine.execute('''DELETE FROM hydro WHERE `Country` REGEXP 'Bermuda|Gibraltar|Niue|Saint Helena';''')
# Reference: - https://stackoverflow.com/a/1127106/10912105
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff4399ad9d0>
# Query_100
engine.execute('''DELETE FROM wind WHERE `Country` REGEXP 'Bermuda|Gibraltar|Niue|Saint Helena';''')
# Reference: - https://stackoverflow.com/a/1127106/10912105
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff438fc0cd0>
# Query_101
engine.execute('''DELETE FROM solar WHERE `Country` REGEXP 'Bermuda|Gibraltar|Niue|Saint Helena';''')
# Reference: - https://stackoverflow.com/a/1127106/10912105
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46c039110>
# Query_102
engine.execute('''DELETE FROM coal WHERE `Country` REGEXP 'Bermuda|Gibraltar|Niue|Saint Helena';''')
# Reference: - https://stackoverflow.com/a/1127106/10912105
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46c0cb790>
# Query_103
engine.execute('''DELETE FROM gas WHERE `Country` REGEXP 'Bermuda|Gibraltar|Niue|Saint Helena';''')
# Reference: - https://stackoverflow.com/a/1127106/10912105
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff4399ad990>
# Query_104
engine.execute('''DELETE FROM oil WHERE `Country` REGEXP 'Bermuda|Gibraltar|Niue|Saint Helena';''')
# Reference: - https://stackoverflow.com/a/1127106/10912105
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff438f72350>
# Query_105
engine.execute('''DELETE FROM total_electricity WHERE `Country` REGEXP 'Bermuda|Gibraltar|Niue|Saint Helena';''')
# Reference: - https://stackoverflow.com/a/1127106/10912105
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff438f729d0>
We have deleted the undesirable countries from all the 3 datasets. Now synchronisation of the 3 datasets need to happen.
1) All the extra values which are in Nuclear dataset but not in Total Electricity Generated dataset need to be eliminated.
2) All the extra values which are in Hydro dataset but not in Total Electricity Generated dataset need to be eliminated.
3) All the extra values which are in Wind dataset but not in Total Electricity Generated dataset need to be eliminated.
4) All the extra values which are in Solar dataset but not in Total Electricity Generated dataset need to be eliminated.
5) All the extra values which are in Coal dataset but not in Total Electricity Generated dataset need to be eliminated.
6) All the extra values which are in Gas dataset but not in Total Electricity Generated dataset need to be eliminated.
7) All the extra values which are in Oil dataset but not in Total Electricity Generated dataset need to be eliminated.
Why? Because we have to take out the precise value of Clean Energy % = (Nuclear + Hydro + Wind + Solar)/ Total Electricity Generated.
and of Dirty Energy % = (Coal + Gas + Oil)/ Total Electricity Generated.
If for a paticular record we have the values of Nuclear and Hydro but NULL for Total Electricity Generated, then we would run into problems because we cannot divide by NULL.
# Query_106
# Now we shall find out the extra values which are in Nuclear table but not in Total Electricity Generated table using JOINS.
# Reference: - https://stackoverflow.com/a/9954815/10912105
query_table = pd.read_sql_query('''SELECT nuclear.`Country`, nuclear.`Year`, total_electricity.`Country`, total_electricity.`Year`
FROM nuclear
LEFT JOIN total_electricity
ON nuclear.`Country` = total_electricity.`Country` AND nuclear.`Year` = total_electricity.`Year`
WHERE total_electricity.`Country` IS NULL OR total_electricity.`Year` IS NULL;''', engine)
display(query_table)
| Country | Year | Country | Year |
|---|
The above query gives an empty database. Therefore there are no extra values in Nuclear that are not in Total Electricity Generated.
# Query_107
# Now we shall find out the extra values which are in Hydro table but not in Total Electricity Generated table using JOINS.
# Reference: - https://stackoverflow.com/a/9954815/10912105
query_table = pd.read_sql_query('''SELECT hydro.`Country`, hydro.`Year`, total_electricity.`Country`, total_electricity.`Year`
FROM hydro
LEFT JOIN total_electricity
ON hydro.`Country` = total_electricity.`Country` AND hydro.`Year` = total_electricity.`Year`
WHERE total_electricity.`Country` IS NULL OR total_electricity.`Year` IS NULL;''', engine)
display(query_table)
| Country | Year | Country | Year |
|---|
The above query gives an empty database. Therefore there are no extra values in Hydro that are not in Total Electricity Generated.
# Query_108
# Now we shall find out the extra values which are in Wind table but not in Total Electricity Generated table using JOINS.
# Reference: - https://stackoverflow.com/a/9954815/10912105
query_table = pd.read_sql_query('''SELECT wind.`Country`, wind.`Year`, total_electricity.`Country`, total_electricity.`Year`
FROM wind
LEFT JOIN total_electricity
ON wind.`Country` = total_electricity.`Country` AND wind.`Year` = total_electricity.`Year`
WHERE total_electricity.`Country` IS NULL OR total_electricity.`Year` IS NULL;''', engine)
display(query_table)
| Country | Year | Country | Year |
|---|
The above query gives an empty database. Therefore there are no extra values in Wind that are not in Total Electricity Generated.
# Query_109
# Now we shall find out the extra values which are in Solar table but not in Total Electricity Generated table using JOINS.
# Reference: - https://stackoverflow.com/a/9954815/10912105
query_table = pd.read_sql_query('''SELECT solar.`Country`, solar.`Year`, total_electricity.`Country`, total_electricity.`Year`
FROM solar
LEFT JOIN total_electricity
ON solar.`Country` = total_electricity.`Country` AND solar.`Year` = total_electricity.`Year`
WHERE total_electricity.`Country` IS NULL OR total_electricity.`Year` IS NULL;''', engine)
display(query_table)
| Country | Year | Country | Year |
|---|
The above query gives an empty database. Therefore there are no extra values in Solar that are not in Total Electricity Generated.
# Query_110
# Now we shall find out the extra values which are in Coal table but not in Total Electricity Generated table using JOINS.
# Reference: - https://stackoverflow.com/a/9954815/10912105
query_table = pd.read_sql_query('''SELECT coal.`Country`, coal.`Year`, total_electricity.`Country`, total_electricity.`Year`
FROM coal
LEFT JOIN total_electricity
ON coal.`Country` = total_electricity.`Country` AND coal.`Year` = total_electricity.`Year`
WHERE total_electricity.`Country` IS NULL OR total_electricity.`Year` IS NULL;''', engine)
display(query_table)
| Country | Year | Country | Year |
|---|
The above query gives an empty database. Therefore there are no extra values in Coal that are not in Total Electricity Generated.
# Query_111
# Now we shall find out the extra values which are in Gas table but not in Total Electricity Generated table using JOINS.
# Reference: - https://stackoverflow.com/a/9954815/10912105
query_table = pd.read_sql_query('''SELECT gas.`Country`, gas.`Year`, total_electricity.`Country`, total_electricity.`Year`
FROM gas
LEFT JOIN total_electricity
ON gas.`Country` = total_electricity.`Country` AND gas.`Year` = total_electricity.`Year`
WHERE total_electricity.`Country` IS NULL OR total_electricity.`Year` IS NULL;''', engine)
display(query_table)
| Country | Year | Country | Year |
|---|
The above query gives an empty database. Therefore there are no extra values in Gas that are not in Total Electricity Generated.
# Query_112
# Now we shall find out the extra values which are in Oil table but not in Total Electricity Generated table using JOINS.
# Reference: - https://stackoverflow.com/a/9954815/10912105
query_table = pd.read_sql_query('''SELECT oil.`Country`, oil.`Year`, total_electricity.`Country`, total_electricity.`Year`
FROM oil
LEFT JOIN total_electricity
ON oil.`Country` = total_electricity.`Country` AND oil.`Year` = total_electricity.`Year`
WHERE total_electricity.`Country` IS NULL OR total_electricity.`Year` IS NULL;''', engine)
display(query_table)
| Country | Year | Country | Year |
|---|
The above query gives an empty database. Therefore there are no extra values in Oil that are not in Total Electricity Generated.
Let us confirm the above results by seeing the total number of records for each dataset.
# Query_113
# Number of records in Nuclear table
query_table = pd.read_sql_query('''SELECT COUNT(*) AS Nuclear_Count FROM nuclear;''', engine)
display(query_table)
| Nuclear_Count | |
|---|---|
| 0 | 5671 |
# Query_114
# Number of records in Hydro table
query_table = pd.read_sql_query('''SELECT COUNT(*) AS Hydro_Count FROM hydro;''', engine)
display(query_table)
| Hydro_Count | |
|---|---|
| 0 | 5682 |
# Query_115
# Number of records in Wind table
query_table = pd.read_sql_query('''SELECT COUNT(*) AS Wind_Count FROM wind;''', engine)
display(query_table)
| Wind_Count | |
|---|---|
| 0 | 5636 |
# Query_116
# Number of records in Solar table
query_table = pd.read_sql_query('''SELECT COUNT(*) AS Solar_Count FROM solar;''', engine)
display(query_table)
| Solar_Count | |
|---|---|
| 0 | 5636 |
# Query_117
# Number of records in Coal table
query_table = pd.read_sql_query('''SELECT COUNT(*) AS Coal_Count FROM coal;''', engine)
display(query_table)
| Coal_Count | |
|---|---|
| 0 | 5191 |
# Query_118
# Number of records in Gas table
query_table = pd.read_sql_query('''SELECT COUNT(*) AS Gas_Count FROM gas;''', engine)
display(query_table)
| Gas_Count | |
|---|---|
| 0 | 5191 |
# Query_119
# Number of records in Oil table
query_table = pd.read_sql_query('''SELECT COUNT(*) AS Oil_Count FROM oil;''', engine)
display(query_table)
| Oil_Count | |
|---|---|
| 0 | 5191 |
# Query_120
# Number of records in Total_electricity table
query_table = pd.read_sql_query('''SELECT COUNT(*) AS Total_Count FROM total_electricity;''', engine)
display(query_table)
| Total_Count | |
|---|---|
| 0 | 5686 |
As we can see from above results, The total_electricity has more records than both all of the dataset.
Let us now see the records which are in total_electricity but not in other datasets.
# Query_121
query_table = pd.read_sql_query('''SELECT total_electricity.`Country` AS Total_Country,
total_electricity.`Year` AS Total_Year,
nuclear.`Country` AS Nuclear_Country,
nuclear.`Year` AS Nuclear_Year
FROM nuclear
RIGHT JOIN total_electricity
ON nuclear.`Country` = total_electricity.`Country` AND nuclear.`Year` = total_electricity.`Year`
WHERE nuclear.`Country` IS NULL OR nuclear.`Year` IS NULL;''', engine)
display(query_table)
| Total_Country | Total_Year | Nuclear_Country | Nuclear_Year | |
|---|---|---|---|---|
| 0 | Algeria | 2021 | None | None |
| 1 | Colombia | 2021 | None | None |
| 2 | Iceland | 2021 | None | None |
| 3 | Indonesia | 2021 | None | None |
| 4 | Iraq | 2021 | None | None |
| 5 | Israel | 2021 | None | None |
| 6 | Kuwait | 2021 | None | None |
| 7 | Morocco | 2021 | None | None |
| 8 | Oman | 2021 | None | None |
| 9 | Qatar | 2021 | None | None |
| 10 | Sri Lanka | 2021 | None | None |
| 11 | Trinidad and Tobago | 2021 | None | None |
| 12 | Turkmenistan | 2021 | None | None |
| 13 | Uzbekistan | 2021 | None | None |
| 14 | Venezuela | 2021 | None | None |
The above records are in total_electricity dataset but not in nuclear dataset.
# Query_122
query_table = pd.read_sql_query('''SELECT total_electricity.`Country` AS Total_Country,
total_electricity.`Year` AS Total_Year,
hydro.`Country` AS Hydro_Country,
hydro.`Year` AS Hydro_Year
FROM hydro
RIGHT JOIN total_electricity
ON hydro.`Country` = total_electricity.`Country` AND hydro.`Year` = total_electricity.`Year`
WHERE hydro.`Country` IS NULL OR hydro.`Year` IS NULL;''', engine)
display(query_table)
| Total_Country | Total_Year | Hydro_Country | Hydro_Year | |
|---|---|---|---|---|
| 0 | Kuwait | 2021 | None | None |
| 1 | Oman | 2021 | None | None |
| 2 | Qatar | 2021 | None | None |
| 3 | United Arab Emirates | 2021 | None | None |
The above records are in total_electricity dataset but not in hydro dataset.
# Query_123
query_table = pd.read_sql_query('''SELECT total_electricity.`Country` AS Total_Country,
total_electricity.`Year` AS Total_Year,
wind.`Country` AS Wind_Country,
wind.`Year` AS Wind_Year
FROM wind
RIGHT JOIN total_electricity
ON wind.`Country` = total_electricity.`Country` AND wind.`Year` = total_electricity.`Year`
WHERE wind.`Country` IS NULL OR wind.`Year` IS NULL;''', engine)
display(query_table)
| Total_Country | Total_Year | Wind_Country | Wind_Year | |
|---|---|---|---|---|
| 0 | Azerbaijan | 1985 | None | None |
| 1 | Azerbaijan | 1986 | None | None |
| 2 | Azerbaijan | 1987 | None | None |
| 3 | Azerbaijan | 1988 | None | None |
| 4 | Azerbaijan | 1989 | None | None |
| 5 | Belarus | 1985 | None | None |
| 6 | Belarus | 1986 | None | None |
| 7 | Belarus | 1987 | None | None |
| 8 | Belarus | 1988 | None | None |
| 9 | Belarus | 1989 | None | None |
| 10 | Estonia | 1985 | None | None |
| 11 | Estonia | 1986 | None | None |
| 12 | Estonia | 1987 | None | None |
| 13 | Estonia | 1988 | None | None |
| 14 | Estonia | 1989 | None | None |
| 15 | Kazakhstan | 1985 | None | None |
| 16 | Kazakhstan | 1986 | None | None |
| 17 | Kazakhstan | 1987 | None | None |
| 18 | Kazakhstan | 1988 | None | None |
| 19 | Kazakhstan | 1989 | None | None |
| 20 | Latvia | 1985 | None | None |
| 21 | Latvia | 1986 | None | None |
| 22 | Latvia | 1987 | None | None |
| 23 | Latvia | 1988 | None | None |
| 24 | Latvia | 1989 | None | None |
| 25 | Lithuania | 1985 | None | None |
| 26 | Lithuania | 1986 | None | None |
| 27 | Lithuania | 1987 | None | None |
| 28 | Lithuania | 1988 | None | None |
| 29 | Lithuania | 1989 | None | None |
| 30 | Russia | 1985 | None | None |
| 31 | Russia | 1986 | None | None |
| 32 | Russia | 1987 | None | None |
| 33 | Russia | 1988 | None | None |
| 34 | Russia | 1989 | None | None |
| 35 | Turkmenistan | 1985 | None | None |
| 36 | Turkmenistan | 1986 | None | None |
| 37 | Turkmenistan | 1987 | None | None |
| 38 | Turkmenistan | 1988 | None | None |
| 39 | Turkmenistan | 1989 | None | None |
| 40 | Ukraine | 1985 | None | None |
| 41 | Ukraine | 1986 | None | None |
| 42 | Ukraine | 1987 | None | None |
| 43 | Ukraine | 1988 | None | None |
| 44 | Ukraine | 1989 | None | None |
| 45 | Uzbekistan | 1985 | None | None |
| 46 | Uzbekistan | 1986 | None | None |
| 47 | Uzbekistan | 1987 | None | None |
| 48 | Uzbekistan | 1988 | None | None |
| 49 | Uzbekistan | 1989 | None | None |
The above records are in total_electricity dataset but not in wind dataset.
# Query_124
query_table = pd.read_sql_query('''SELECT total_electricity.`Country` AS Total_Country,
total_electricity.`Year` AS Total_Year,
solar.`Country` AS Solar_Country,
solar.`Year` AS Solar_Year
FROM solar
RIGHT JOIN total_electricity
ON solar.`Country` = total_electricity.`Country` AND solar.`Year` = total_electricity.`Year`
WHERE solar.`Country` IS NULL OR solar.`Year` IS NULL;''', engine)
display(query_table)
| Total_Country | Total_Year | Solar_Country | Solar_Year | |
|---|---|---|---|---|
| 0 | Azerbaijan | 1985 | None | None |
| 1 | Azerbaijan | 1986 | None | None |
| 2 | Azerbaijan | 1987 | None | None |
| 3 | Azerbaijan | 1988 | None | None |
| 4 | Azerbaijan | 1989 | None | None |
| 5 | Belarus | 1985 | None | None |
| 6 | Belarus | 1986 | None | None |
| 7 | Belarus | 1987 | None | None |
| 8 | Belarus | 1988 | None | None |
| 9 | Belarus | 1989 | None | None |
| 10 | Estonia | 1985 | None | None |
| 11 | Estonia | 1986 | None | None |
| 12 | Estonia | 1987 | None | None |
| 13 | Estonia | 1988 | None | None |
| 14 | Estonia | 1989 | None | None |
| 15 | Kazakhstan | 1985 | None | None |
| 16 | Kazakhstan | 1986 | None | None |
| 17 | Kazakhstan | 1987 | None | None |
| 18 | Kazakhstan | 1988 | None | None |
| 19 | Kazakhstan | 1989 | None | None |
| 20 | Latvia | 1985 | None | None |
| 21 | Latvia | 1986 | None | None |
| 22 | Latvia | 1987 | None | None |
| 23 | Latvia | 1988 | None | None |
| 24 | Latvia | 1989 | None | None |
| 25 | Lithuania | 1985 | None | None |
| 26 | Lithuania | 1986 | None | None |
| 27 | Lithuania | 1987 | None | None |
| 28 | Lithuania | 1988 | None | None |
| 29 | Lithuania | 1989 | None | None |
| 30 | Russia | 1985 | None | None |
| 31 | Russia | 1986 | None | None |
| 32 | Russia | 1987 | None | None |
| 33 | Russia | 1988 | None | None |
| 34 | Russia | 1989 | None | None |
| 35 | Turkmenistan | 1985 | None | None |
| 36 | Turkmenistan | 1986 | None | None |
| 37 | Turkmenistan | 1987 | None | None |
| 38 | Turkmenistan | 1988 | None | None |
| 39 | Turkmenistan | 1989 | None | None |
| 40 | Ukraine | 1985 | None | None |
| 41 | Ukraine | 1986 | None | None |
| 42 | Ukraine | 1987 | None | None |
| 43 | Ukraine | 1988 | None | None |
| 44 | Ukraine | 1989 | None | None |
| 45 | Uzbekistan | 1985 | None | None |
| 46 | Uzbekistan | 1986 | None | None |
| 47 | Uzbekistan | 1987 | None | None |
| 48 | Uzbekistan | 1988 | None | None |
| 49 | Uzbekistan | 1989 | None | None |
The above records are in total_electricity dataset but not in solar dataset.
# Query_125
query_table = pd.read_sql_query('''SELECT total_electricity.`Country` AS Total_Country,
total_electricity.`Year` AS Total_Year,
coal.`Country` AS Coal_Country,
coal.`Year` AS Coal_Year
FROM coal
RIGHT JOIN total_electricity
ON coal.`Country` = total_electricity.`Country` AND coal.`Year` = total_electricity.`Year`
WHERE coal.`Country` IS NULL OR coal.`Year` IS NULL;''', engine)
display(query_table)
| Total_Country | Total_Year | Coal_Country | Coal_Year | |
|---|---|---|---|---|
| 0 | Algeria | 1985 | None | None |
| 1 | Algeria | 1986 | None | None |
| 2 | Algeria | 1987 | None | None |
| 3 | Algeria | 1988 | None | None |
| 4 | Algeria | 1989 | None | None |
| ... | ... | ... | ... | ... |
| 490 | Venezuela | 1996 | None | None |
| 491 | Venezuela | 1997 | None | None |
| 492 | Venezuela | 1998 | None | None |
| 493 | Venezuela | 1999 | None | None |
| 494 | Venezuela | 2021 | None | None |
495 rows × 4 columns
The above records are in total_electricity dataset but not in coal dataset.
# Query_126
query_table = pd.read_sql_query('''SELECT total_electricity.`Country` AS Total_Country,
total_electricity.`Year` AS Total_Year,
gas.`Country` AS Gas_Country,
gas.`Year` AS Gas_Year
FROM gas
RIGHT JOIN total_electricity
ON gas.`Country` = total_electricity.`Country` AND gas.`Year` = total_electricity.`Year`
WHERE gas.`Country` IS NULL OR gas.`Year` IS NULL;''', engine)
display(query_table)
| Total_Country | Total_Year | Gas_Country | Gas_Year | |
|---|---|---|---|---|
| 0 | Algeria | 1985 | None | None |
| 1 | Algeria | 1986 | None | None |
| 2 | Algeria | 1987 | None | None |
| 3 | Algeria | 1988 | None | None |
| 4 | Algeria | 1989 | None | None |
| ... | ... | ... | ... | ... |
| 490 | Venezuela | 1996 | None | None |
| 491 | Venezuela | 1997 | None | None |
| 492 | Venezuela | 1998 | None | None |
| 493 | Venezuela | 1999 | None | None |
| 494 | Venezuela | 2021 | None | None |
495 rows × 4 columns
The above records are in total_electricity dataset but not in gas dataset.
# Query_127
query_table = pd.read_sql_query('''SELECT total_electricity.`Country` AS Total_Country,
total_electricity.`Year` AS Total_Year,
oil.`Country` AS Oil_Country,
oil.`Year` AS Oil_Year
FROM oil
RIGHT JOIN total_electricity
ON oil.`Country` = total_electricity.`Country` AND oil.`Year` = total_electricity.`Year`
WHERE oil.`Country` IS NULL OR oil.`Year` IS NULL;''', engine)
display(query_table)
| Total_Country | Total_Year | Oil_Country | Oil_Year | |
|---|---|---|---|---|
| 0 | Algeria | 1985 | None | None |
| 1 | Algeria | 1986 | None | None |
| 2 | Algeria | 1987 | None | None |
| 3 | Algeria | 1988 | None | None |
| 4 | Algeria | 1989 | None | None |
| ... | ... | ... | ... | ... |
| 490 | Venezuela | 1996 | None | None |
| 491 | Venezuela | 1997 | None | None |
| 492 | Venezuela | 1998 | None | None |
| 493 | Venezuela | 1999 | None | None |
| 494 | Venezuela | 2021 | None | None |
495 rows × 4 columns
The above records are in total_electricity dataset but not in oil dataset.
Finding country wise trend of usage of Eco-friendly fuel V/S Dirty fuel sources: -
Clean Energy % = (Nuclear + Hydro + Wind + Solar)/ Total Electricity Generated.
Dirty Energy % = (Coal + Gas + Oil)/ Total Electricity Generated.
# Query_128
# Since we would be dividing by Total Electricity Generated, we must make sure there are no more NULL values / 0 values in the denominator.
# Therefore we would delete all those records which have 0/NULL anywhere in the Total Electricity Generated.
engine.execute('''DELETE FROM total_electricity WHERE total_electricity.`Electricity generation (TWh)` = 0 OR total_electricity.`Electricity generation (TWh)` IS NULL;''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff46bc8d610>
# Query_129
# Now we shall create a table exclusively for Clean Electricity Energy Generation
# References: -
# https://stackoverflow.com/a/5859469/10912105
engine.execute('''
CREATE TEMPORARY TABLE IF NOT EXISTS clean_generation AS
(SELECT
total_electricity.`Country` AS Country,
total_electricity.`Year` AS Year,
nuclear.`Electricity from nuclear (TWh)` AS Nuclear,
hydro.`Electricity from hydro (TWh)` AS Hydro,
wind.`Electricity from wind (TWh)` AS Wind,
solar.`Electricity from solar (TWh)` AS Solar,
total_electricity.`Electricity generation (TWh)` AS Total,
(((nuclear.`Electricity from nuclear (TWh)` + hydro.`Electricity from hydro (TWh)` + wind.`Electricity from wind (TWh)` + solar.`Electricity from solar (TWh)`)/total_electricity.`Electricity generation (TWh)`)*100) AS Clean_Percentage
FROM
total_electricity
INNER JOIN
nuclear
ON
total_electricity.`Country` = nuclear.`Country` AND total_electricity.`Year` = nuclear.`Year`
INNER JOIN
hydro
ON
total_electricity.`Country` = hydro.`Country` AND total_electricity.`Year` = hydro.`Year`
INNER JOIN
wind
ON
total_electricity.`Country` = wind.`Country` AND total_electricity.`Year` = wind.`Year`
INNER JOIN
solar
ON
total_electricity.`Country` = solar.`Country` AND total_electricity.`Year` = solar.`Year`
GROUP BY
total_electricity.`Country`, total_electricity.`Year`);
''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff4783e3590>
# Query_130
# Let's see our table
query_table = pd.read_sql_query('''SELECT * FROM clean_generation;''', engine)
display(query_table)
| Country | Year | Nuclear | Hydro | Wind | Solar | Total | Clean_Percentage | |
|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 2000 | 0.0 | 0.31 | 0.0 | 0.00 | 0.47 | 65.957447 |
| 1 | Afghanistan | 2001 | 0.0 | 0.50 | 0.0 | 0.00 | 0.59 | 84.745763 |
| 2 | Afghanistan | 2002 | 0.0 | 0.56 | 0.0 | 0.00 | 0.69 | 81.159421 |
| 3 | Afghanistan | 2003 | 0.0 | 0.63 | 0.0 | 0.00 | 0.94 | 67.021276 |
| 4 | Afghanistan | 2004 | 0.0 | 0.56 | 0.0 | 0.00 | 0.89 | 62.921349 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5586 | Zimbabwe | 2016 | 0.0 | 2.95 | 0.0 | 0.01 | 6.82 | 43.401760 |
| 5587 | Zimbabwe | 2017 | 0.0 | 3.93 | 0.0 | 0.01 | 7.31 | 53.898770 |
| 5588 | Zimbabwe | 2018 | 0.0 | 5.00 | 0.0 | 0.01 | 9.13 | 54.874042 |
| 5589 | Zimbabwe | 2019 | 0.0 | 7.26 | 0.0 | 0.01 | 11.31 | 64.279401 |
| 5590 | Zimbabwe | 2020 | 0.0 | 7.26 | 0.0 | 0.01 | 11.11 | 65.436546 |
5591 rows × 8 columns
# Query_131
# Exploring our clean_generation table
query_table = pd.read_sql_query('''SELECT * FROM clean_generation
GROUP BY `Country`, `Year`
ORDER BY `Total` DESC, `Clean_Percentage` DESC;''', engine)
display(query_table)
| Country | Year | Nuclear | Hydro | Wind | Solar | Total | Clean_Percentage | |
|---|---|---|---|---|---|---|---|---|
| 0 | China | 2021 | 407.500000 | 1300.000000 | 655.599976 | 327.000000 | 8534.250 | 31.521223 |
| 1 | China | 2020 | 366.200012 | 1321.708984 | 466.500000 | 261.100006 | 7779.060 | 31.051425 |
| 2 | China | 2019 | 348.700012 | 1272.537964 | 405.299988 | 224.000000 | 7503.428 | 29.993464 |
| 3 | China | 2018 | 295.000000 | 1198.886963 | 365.799988 | 176.899994 | 7166.133 | 28.419609 |
| 4 | China | 2017 | 248.100006 | 1165.064941 | 304.600006 | 117.800003 | 6604.447 | 27.792864 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5586 | Kiribati | 2001 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010 | 0.000000 |
| 5587 | Kiribati | 2002 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010 | 0.000000 |
| 5588 | Montserrat | 2000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010 | 0.000000 |
| 5589 | Montserrat | 2001 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010 | 0.000000 |
| 5590 | Montserrat | 2020 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010 | 0.000000 |
5591 rows × 8 columns
# Query_132
# Now we shall create a table exclusively for Dirty Electricity Energy Generation
# References: -
# https://stackoverflow.com/a/5859469/10912105
engine.execute('''
CREATE TEMPORARY TABLE IF NOT EXISTS dirty_generation AS
(SELECT
total_electricity.`Country` AS Country,
total_electricity.`Year` AS Year,
coal.`Electricity from coal (TWh)` AS Coal,
gas.`Electricity from gas (TWh)` AS Gas,
oil.`Electricity from oil (TWh)` AS Oil,
total_electricity.`Electricity generation (TWh)` AS Total,
(((coal.`Electricity from coal (TWh)` + gas.`Electricity from gas (TWh)` + oil.`Electricity from oil (TWh)`)/total_electricity.`Electricity generation (TWh)`)*100) AS Dirty_Percentage
FROM
total_electricity
INNER JOIN
coal
ON
total_electricity.`Country` = coal.`Country` AND total_electricity.`Year` = coal.`Year`
INNER JOIN
gas
ON
total_electricity.`Country` = gas.`Country` AND total_electricity.`Year` = gas.`Year`
INNER JOIN
oil
ON
total_electricity.`Country` = oil.`Country` AND total_electricity.`Year` = oil.`Year`
GROUP BY
total_electricity.`Country`, total_electricity.`Year`);
''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff4397daa50>
# Query_133
# Let's see our table
query_table = pd.read_sql_query('''SELECT * FROM dirty_generation;''', engine)
display(query_table)
| Country | Year | Coal | Gas | Oil | Total | Dirty_Percentage | |
|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 2000 | 0.00 | 0.0 | 0.16 | 0.47 | 34.042552 |
| 1 | Afghanistan | 2001 | 0.00 | 0.0 | 0.09 | 0.59 | 15.254238 |
| 2 | Afghanistan | 2002 | 0.00 | 0.0 | 0.13 | 0.69 | 18.840579 |
| 3 | Afghanistan | 2003 | 0.00 | 0.0 | 0.31 | 0.94 | 32.978724 |
| 4 | Afghanistan | 2004 | 0.00 | 0.0 | 0.33 | 0.89 | 37.078653 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 5157 | Zimbabwe | 2016 | 3.46 | 0.0 | 0.04 | 6.82 | 51.319649 |
| 5158 | Zimbabwe | 2017 | 3.02 | 0.0 | 0.03 | 7.31 | 41.723666 |
| 5159 | Zimbabwe | 2018 | 3.69 | 0.0 | 0.04 | 9.13 | 40.854327 |
| 5160 | Zimbabwe | 2019 | 3.62 | 0.0 | 0.04 | 11.31 | 32.360742 |
| 5161 | Zimbabwe | 2020 | 3.44 | 0.0 | 0.04 | 11.11 | 31.323133 |
5162 rows × 7 columns
# Query_134
# Exploring our dirty_generation table
query_table = pd.read_sql_query('''SELECT * FROM dirty_generation
GROUP BY `Country`, `Year`
ORDER BY `Total` DESC, `Dirty_Percentage` DESC;''', engine)
display(query_table)
| Country | Year | Coal | Gas | Oil | Total | Dirty_Percentage | |
|---|---|---|---|---|---|---|---|
| 0 | China | 2021 | 5339.144043 | 272.597992 | 12.248143 | 8534.250 | 65.899056 |
| 1 | China | 2020 | 4920.826172 | 252.500000 | 10.799000 | 7779.060 | 66.642052 |
| 2 | China | 2019 | 4855.192871 | 232.500000 | 10.528831 | 7503.428 | 67.945234 |
| 3 | China | 2018 | 4763.917480 | 215.500000 | 10.861873 | 7166.133 | 69.636990 |
| 4 | China | 2017 | 4430.028320 | 203.199997 | 9.868982 | 6604.447 | 70.302590 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 5157 | Kiribati | 2001 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 99.999998 |
| 5158 | Kiribati | 2002 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 99.999998 |
| 5159 | Montserrat | 2000 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 99.999998 |
| 5160 | Montserrat | 2001 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 99.999998 |
| 5161 | Montserrat | 2020 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 99.999998 |
5162 rows × 7 columns
# Query_135
# Joining our clean_generation table and dirty_generation table to make our final table.
engine.execute('''
CREATE TEMPORARY TABLE IF NOT EXISTS meta_generation AS
(SELECT
clean_generation.`Country`,
clean_generation.`Year`,
clean_generation.`Nuclear`,
clean_generation.`Hydro`,
clean_generation.`Wind`,
clean_generation.`Solar`,
dirty_generation.`Coal`,
dirty_generation.`Gas`,
dirty_generation.`Oil`,
clean_generation.`Total`,
clean_generation.`Clean_Percentage`,
dirty_generation.`Dirty_Percentage`
FROM
clean_generation
INNER JOIN
dirty_generation
ON
clean_generation.`Country` = dirty_generation.`Country` AND clean_generation.`Year` = dirty_generation.`Year`);
''')
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7ff44010cb10>
# Query_136
# Let's see our final table
query_table = pd.read_sql_query('''SELECT * FROM meta_generation;''', engine)
display(query_table)
| Country | Year | Nuclear | Hydro | Wind | Solar | Coal | Gas | Oil | Total | Clean_Percentage | Dirty_Percentage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 2000 | 0.0 | 0.31 | 0.0 | 0.00 | 0.00 | 0.0 | 0.16 | 0.47 | 65.957447 | 34.042552 |
| 1 | Afghanistan | 2001 | 0.0 | 0.50 | 0.0 | 0.00 | 0.00 | 0.0 | 0.09 | 0.59 | 84.745763 | 15.254238 |
| 2 | Afghanistan | 2002 | 0.0 | 0.56 | 0.0 | 0.00 | 0.00 | 0.0 | 0.13 | 0.69 | 81.159421 | 18.840579 |
| 3 | Afghanistan | 2003 | 0.0 | 0.63 | 0.0 | 0.00 | 0.00 | 0.0 | 0.31 | 0.94 | 67.021276 | 32.978724 |
| 4 | Afghanistan | 2004 | 0.0 | 0.56 | 0.0 | 0.00 | 0.00 | 0.0 | 0.33 | 0.89 | 62.921349 | 37.078653 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5140 | Zimbabwe | 2016 | 0.0 | 2.95 | 0.0 | 0.01 | 3.46 | 0.0 | 0.04 | 6.82 | 43.401760 | 51.319649 |
| 5141 | Zimbabwe | 2017 | 0.0 | 3.93 | 0.0 | 0.01 | 3.02 | 0.0 | 0.03 | 7.31 | 53.898770 | 41.723666 |
| 5142 | Zimbabwe | 2018 | 0.0 | 5.00 | 0.0 | 0.01 | 3.69 | 0.0 | 0.04 | 9.13 | 54.874042 | 40.854327 |
| 5143 | Zimbabwe | 2019 | 0.0 | 7.26 | 0.0 | 0.01 | 3.62 | 0.0 | 0.04 | 11.31 | 64.279401 | 32.360742 |
| 5144 | Zimbabwe | 2020 | 0.0 | 7.26 | 0.0 | 0.01 | 3.44 | 0.0 | 0.04 | 11.11 | 65.436546 | 31.323133 |
5145 rows × 12 columns
# Query_137
# Exploring our meta_generation table
query_table = pd.read_sql_query('''SELECT * FROM meta_generation
GROUP BY `Country`, `Year`
ORDER BY `Total` DESC''', engine)
display(query_table)
| Country | Year | Nuclear | Hydro | Wind | Solar | Coal | Gas | Oil | Total | Clean_Percentage | Dirty_Percentage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | China | 2021 | 407.500000 | 1300.000000 | 655.599976 | 327.000000 | 5339.144043 | 272.597992 | 12.248143 | 8534.250 | 31.521223 | 65.899056 |
| 1 | China | 2020 | 366.200012 | 1321.708984 | 466.500000 | 261.100006 | 4920.826172 | 252.500000 | 10.799000 | 7779.060 | 31.051425 | 66.642052 |
| 2 | China | 2019 | 348.700012 | 1272.537964 | 405.299988 | 224.000000 | 4855.192871 | 232.500000 | 10.528831 | 7503.428 | 29.993464 | 67.945234 |
| 3 | China | 2018 | 295.000000 | 1198.886963 | 365.799988 | 176.899994 | 4763.917480 | 215.500000 | 10.861873 | 7166.133 | 28.419609 | 69.636990 |
| 4 | China | 2017 | 248.100006 | 1165.064941 | 304.600006 | 117.800003 | 4430.028320 | 203.199997 | 9.868982 | 6604.447 | 27.792864 | 70.302590 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5140 | Kiribati | 2001 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 0.000000 | 99.999998 |
| 5141 | Kiribati | 2002 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 0.000000 | 99.999998 |
| 5142 | Montserrat | 2000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 0.000000 | 99.999998 |
| 5143 | Montserrat | 2001 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 0.000000 | 99.999998 |
| 5144 | Montserrat | 2020 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010000 | 0.010 | 0.000000 | 99.999998 |
5145 rows × 12 columns
# Query_138
# Taking out the ranks for relevant columns.
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
MAX(`Year`) AS Latest_Year,
(`Nuclear` + `Hydro` + `Wind` + `Solar`) AS Clean_generation,
(`Coal` + `Gas` + `Oil`) AS Dirty_generation,
`Total`,
`Clean_Percentage`,
`Dirty_Percentage`,
RANK() OVER(ORDER BY Clean_generation DESC) AS Clean_Rank,
RANK() OVER(ORDER BY Dirty_generation DESC) AS Dirty_Rank,
RANK() OVER(ORDER BY Clean_Percentage DESC) AS Clean_Percentage_Rank,
RANK() OVER(ORDER BY Dirty_Percentage DESC) AS Dirty_Percentage_Rank
FROM
meta_generation
WHERE
`YEAR` = 2021
GROUP BY
`Country`, `Year`
ORDER BY
`Total` DESC;
''', engine)
display(query_table)
| Country | Latest_Year | Clean_generation | Dirty_generation | Total | Clean_Percentage | Dirty_Percentage | Clean_Rank | Dirty_Rank | Clean_Percentage_Rank | Dirty_Percentage_Rank | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | China | 2021 | 2690.099976 | 5623.990178 | 8534.250000 | 31.521223 | 65.899056 | 1 | 1 | 47 | 27 |
| 1 | United States | 2021 | 1625.756805 | 2692.444366 | 4406.413000 | 36.895243 | 61.102860 | 2 | 2 | 42 | 34 |
| 2 | India | 2021 | 340.643444 | 1337.623460 | 1714.820900 | 19.864666 | 78.003683 | 7 | 3 | 58 | 19 |
| 3 | Russia | 2021 | 441.821898 | 710.020799 | 1157.085800 | 38.184022 | 61.362848 | 6 | 4 | 40 | 33 |
| 4 | Japan | 2021 | 233.375048 | 659.267569 | 1019.721250 | 22.886161 | 64.651744 | 9 | 5 | 57 | 29 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 74 | Lithuania | 2021 | 1.870149 | 1.360000 | 4.783984 | 39.091871 | 28.428191 | 65 | 75 | 37 | 60 |
| 75 | Montenegro | 2021 | 2.180000 | 1.470000 | 3.650000 | 59.726028 | 40.273973 | 64 | 73 | 23 | 50 |
| 76 | Malta | 2021 | 0.210000 | 2.240000 | 2.460000 | 8.536585 | 91.056913 | 78 | 70 | 71 | 6 |
| 77 | Luxembourg | 2021 | 0.665714 | 0.270000 | 2.306064 | 28.867992 | 11.708263 | 75 | 77 | 52 | 73 |
| 78 | Burundi | 2021 | 0.170000 | 0.140000 | 0.320000 | 53.124999 | 43.750000 | 79 | 78 | 27 | 46 |
79 rows × 11 columns
# Query_139
# Now we will only display those data which has complete `Year` values i.e. from 1985 till 2021.
# Reference: - https://stackoverflow.com/a/74871098/10912105
query_table = pd.read_sql_query(
'''
WITH Countries
AS
(
SELECT `Country`
FROM meta_generation
WHERE `Year` BETWEEN 1985 AND 2021
GROUP BY `Country`
HAVING COUNT(DISTINCT Year) = 37
)
SELECT *
FROM meta_generation
INNER JOIN Countries ON meta_generation.`Country` = Countries.`Country`;
''', engine)
display(query_table)
| Country | Year | Nuclear | Hydro | Wind | Solar | Coal | Gas | Oil | Total | Clean_Percentage | Dirty_Percentage | Country | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Argentina | 1985 | 5.766 | 20.653124 | 0.000000 | 0.000000 | 0.607000 | 12.450280 | 5.706000 | 45.303726 | 58.315565 | 41.416638 | Argentina |
| 1 | Argentina | 1986 | 5.711 | 21.030247 | 0.000000 | 0.000000 | 1.013000 | 13.251092 | 7.894000 | 49.022194 | 54.549265 | 45.200123 | Argentina |
| 2 | Argentina | 1987 | 6.465 | 21.912518 | 0.000000 | 0.000000 | 1.016000 | 12.537193 | 10.078000 | 52.115280 | 54.451435 | 45.344078 | Argentina |
| 3 | Argentina | 1988 | 5.798 | 15.259144 | 0.000000 | 0.000000 | 1.586000 | 21.067162 | 8.635000 | 52.499520 | 40.109212 | 59.597043 | Argentina |
| 4 | Argentina | 1989 | 5.039 | 13.187393 | 0.000000 | 0.000000 | 0.865000 | 24.336302 | 7.204000 | 50.862812 | 35.834419 | 63.711188 | Argentina |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 920 | Vietnam | 2017 | 0.000 | 87.598999 | 0.350000 | 0.007992 | 62.612999 | 39.955456 | 0.991100 | 191.593000 | 45.908249 | 54.051847 | Vietnam |
| 921 | Vietnam | 2018 | 0.000 | 84.488998 | 0.487000 | 0.104992 | 83.845001 | 39.916496 | 0.212300 | 209.181000 | 40.673384 | 59.266280 | Vietnam |
| 922 | Vietnam | 2019 | 0.000 | 66.542000 | 0.722000 | 5.251790 | 111.180000 | 42.184689 | 1.416000 | 227.422680 | 31.885909 | 68.058599 | Vietnam |
| 923 | Vietnam | 2020 | 0.000 | 73.382004 | 1.073790 | 10.860000 | 114.764999 | 33.951771 | 1.251600 | 235.410370 | 36.241306 | 63.705082 | Vietnam |
| 924 | Vietnam | 2021 | 0.000 | 75.899475 | 2.391613 | 25.770420 | 114.123138 | 26.223457 | 0.210715 | 244.776860 | 42.512805 | 57.422630 | Vietnam |
925 rows × 13 columns
# Query_140
# Taking out the ranks for relevant columns from the subset of our dataset (only choosing countries with years from year 1985 till 2021).
query_table = pd.read_sql_query(
'''
WITH Countries
AS
(
SELECT `Country`
FROM meta_generation
WHERE `Year` BETWEEN 1985 AND 2021
GROUP BY `Country`
HAVING COUNT(DISTINCT Year) = 37
)
SELECT
meta_generation.`Country`,
MAX(`Year`) AS Latest_Year,
(`Nuclear` + `Hydro` + `Wind` + `Solar`) AS Clean_generation,
(`Coal` + `Gas` + `Oil`) AS Dirty_generation,
`Total`,
`Clean_Percentage`,
`Dirty_Percentage`,
RANK() OVER(ORDER BY Clean_generation DESC) AS Clean_Rank,
RANK() OVER(ORDER BY Dirty_generation DESC) AS Dirty_Rank,
RANK() OVER(ORDER BY Clean_Percentage DESC) AS Clean_Percentage_Rank,
RANK() OVER(ORDER BY Dirty_Percentage DESC) AS Dirty_Percentage_Rank
FROM
meta_generation
INNER JOIN
Countries ON meta_generation.`Country` = Countries.`Country`
WHERE
`YEAR` = 2021
GROUP BY
meta_generation.`Country`, `Year`
ORDER BY
`Total` DESC;
''', engine)
display(query_table)
| Country | Latest_Year | Clean_generation | Dirty_generation | Total | Clean_Percentage | Dirty_Percentage | Clean_Rank | Dirty_Rank | Clean_Percentage_Rank | Dirty_Percentage_Rank | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | China | 2021 | 2690.099976 | 5623.990178 | 8534.25000 | 31.521223 | 65.899056 | 1 | 1 | 8 | 13 |
| 1 | United States | 2021 | 1625.756805 | 2692.444366 | 4406.41300 | 36.895243 | 61.102860 | 2 | 2 | 7 | 18 |
| 2 | India | 2021 | 340.643444 | 1337.623460 | 1714.82090 | 19.864666 | 78.003683 | 5 | 3 | 17 | 9 |
| 3 | Japan | 2021 | 233.375048 | 659.267569 | 1019.72125 | 22.886161 | 64.651744 | 7 | 4 | 16 | 15 |
| 4 | Brazil | 2021 | 466.561293 | 132.829512 | 654.35004 | 71.301485 | 20.299458 | 4 | 20 | 2 | 24 |
| 5 | Canada | 2021 | 513.085276 | 117.507673 | 641.02625 | 80.041227 | 18.331180 | 3 | 22 | 1 | 25 |
| 6 | South Korea | 2021 | 186.052978 | 395.025376 | 600.43400 | 30.986416 | 65.789975 | 8 | 5 | 9 | 14 |
| 7 | Germany | 2021 | 254.799997 | 256.400006 | 584.52570 | 43.590897 | 43.864625 | 6 | 8 | 4 | 21 |
| 8 | Iran | 2021 | 20.222286 | 337.604826 | 357.83780 | 5.651244 | 94.345770 | 23 | 7 | 24 | 2 |
| 9 | Saudi Arabia | 2021 | 0.830416 | 355.792236 | 356.62265 | 0.232856 | 99.767145 | 25 | 6 | 25 | 1 |
| 10 | Mexico | 2021 | 79.455140 | 249.667418 | 336.02860 | 23.645350 | 74.299455 | 14 | 9 | 15 | 10 |
| 11 | Turkey | 2021 | 99.666527 | 214.932570 | 333.32654 | 29.900567 | 64.481085 | 12 | 11 | 11 | 16 |
| 12 | United Kingdom | 2021 | 127.726321 | 132.216399 | 309.85700 | 41.221054 | 42.670134 | 10 | 21 | 6 | 22 |
| 13 | Taiwan | 2021 | 41.419122 | 242.536161 | 290.92120 | 14.237231 | 83.368335 | 17 | 10 | 19 | 6 |
| 14 | Italy | 2021 | 88.741413 | 169.216186 | 287.20328 | 30.898468 | 58.918612 | 13 | 15 | 10 | 19 |
| 15 | Spain | 2021 | 175.230650 | 85.581920 | 272.13388 | 64.391339 | 31.448462 | 9 | 24 | 3 | 23 |
| 16 | Australia | 2021 | 73.985860 | 189.736693 | 267.45227 | 27.663201 | 70.942263 | 15 | 13 | 13 | 11 |
| 17 | Vietnam | 2021 | 104.061508 | 140.557311 | 244.77686 | 42.512805 | 57.422630 | 11 | 19 | 5 | 20 |
| 18 | South Africa | 2021 | 27.912287 | 211.134950 | 244.32200 | 11.424385 | 86.416675 | 20 | 12 | 22 | 4 |
| 19 | Egypt | 2021 | 25.094481 | 184.561012 | 209.65550 | 11.969389 | 88.030608 | 21 | 14 | 21 | 3 |
| 20 | Poland | 2021 | 22.490999 | 148.586001 | 180.00032 | 12.494977 | 82.547632 | 22 | 17 | 20 | 7 |
| 21 | Malaysia | 2021 | 33.859617 | 141.748340 | 177.24567 | 19.103213 | 79.972808 | 18 | 18 | 18 | 8 |
| 22 | Thailand | 2021 | 13.452449 | 149.893858 | 176.32740 | 7.629245 | 85.008829 | 24 | 16 | 23 | 5 |
| 23 | Argentina | 2021 | 45.562223 | 104.182447 | 152.46538 | 29.883652 | 68.331871 | 16 | 23 | 12 | 12 |
| 24 | Netherlands | 2021 | 33.244860 | 75.459025 | 121.61600 | 27.335926 | 62.046955 | 19 | 25 | 14 | 17 |
# Query_141
# Doing more cleanup.
engine.execute(
'''
CREATE TEMPORARY TABLE IF NOT EXISTS Countries AS
(
SELECT `Country`
FROM meta_generation
WHERE `Year` BETWEEN 1985 AND 2021
GROUP BY `Country`
HAVING COUNT(DISTINCT Year) = 37
);
''')
query_table = pd.read_sql_query(
'''
SELECT * FROM meta_generation WHERE meta_generation.`Country` IN (SELECT * FROM Countries);
''', engine)
display(query_table)
| Country | Year | Nuclear | Hydro | Wind | Solar | Coal | Gas | Oil | Total | Clean_Percentage | Dirty_Percentage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Argentina | 1985 | 5.766 | 20.653124 | 0.000000 | 0.000000 | 0.607000 | 12.450280 | 5.706000 | 45.303726 | 58.315565 | 41.416638 |
| 1 | Argentina | 1986 | 5.711 | 21.030247 | 0.000000 | 0.000000 | 1.013000 | 13.251092 | 7.894000 | 49.022194 | 54.549265 | 45.200123 |
| 2 | Argentina | 1987 | 6.465 | 21.912518 | 0.000000 | 0.000000 | 1.016000 | 12.537193 | 10.078000 | 52.115280 | 54.451435 | 45.344078 |
| 3 | Argentina | 1988 | 5.798 | 15.259144 | 0.000000 | 0.000000 | 1.586000 | 21.067162 | 8.635000 | 52.499520 | 40.109212 | 59.597043 |
| 4 | Argentina | 1989 | 5.039 | 13.187393 | 0.000000 | 0.000000 | 0.865000 | 24.336302 | 7.204000 | 50.862812 | 35.834419 | 63.711188 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 920 | Vietnam | 2017 | 0.000 | 87.598999 | 0.350000 | 0.007992 | 62.612999 | 39.955456 | 0.991100 | 191.593000 | 45.908249 | 54.051847 |
| 921 | Vietnam | 2018 | 0.000 | 84.488998 | 0.487000 | 0.104992 | 83.845001 | 39.916496 | 0.212300 | 209.181000 | 40.673384 | 59.266280 |
| 922 | Vietnam | 2019 | 0.000 | 66.542000 | 0.722000 | 5.251790 | 111.180000 | 42.184689 | 1.416000 | 227.422680 | 31.885909 | 68.058599 |
| 923 | Vietnam | 2020 | 0.000 | 73.382004 | 1.073790 | 10.860000 | 114.764999 | 33.951771 | 1.251600 | 235.410370 | 36.241306 | 63.705082 |
| 924 | Vietnam | 2021 | 0.000 | 75.899475 | 2.391613 | 25.770420 | 114.123138 | 26.223457 | 0.210715 | 244.776860 | 42.512805 | 57.422630 |
925 rows × 12 columns
# We would now display the results of only Nuclear energy
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Nuclear`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Nuclear` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Nuclear` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Nuclear", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,1000])
fig.show()
# We would now display the results of only Hydro energy.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Hydro`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Hydro` > 10
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Hydro` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Hydro", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,1500])
fig.show()
# We would now display the results of only Wind energy.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Wind`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Wind` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Wind` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Wind", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,700])
fig.show()
# We would now display the results of only Solar energy.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Solar`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Solar` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Solar` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Solar", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,350])
fig.show()
# We would now display the results of only Coal energy.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Coal`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Coal` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Coal` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Coal", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,5500])
fig.show()
# We would now display the results of only Gas energy.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Gas`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Gas` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Gas` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Gas", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,2000])
fig.show()
# We would now display the results of only Oil energy.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Oil`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Oil` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Oil` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Oil", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,300])
fig.show()
# We would now display the results of Total energy.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Total` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Total` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Total", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,9000])
fig.show()
# We would now display the results of Clean energy %.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Clean_Percentage`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Clean_Percentage` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Clean_Percentage` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Clean_Percentage", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,100])
fig.show()
# We would now display the results of Dirty energy %.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
`Dirty_Percentage`,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries) AND `Dirty_Percentage` <> 0
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Dirty_Percentage` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Dirty_Percentage", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,100])
fig.show()
# We would now display the results of Total Clean energy generated.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
(`Nuclear` + `Hydro` + `Wind` + `Solar`) AS Clean_generation,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries)
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Clean_generation` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Clean_generation", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,3000])
fig.show()
# We would now display the results of Total Dirty energy generated.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
`Country`,
`Year`,
(`Coal` + `Gas` + `Oil`) AS Dirty_generation,
`Total`
FROM
meta_generation
WHERE
meta_generation.`Country` IN (SELECT * FROM Countries)
GROUP BY
`Country`,
`Year`
ORDER BY
`Year` ASC,
`Dirty_generation` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y="Dirty_generation", color="Total",
animation_frame="Year", animation_group="Country", range_y=[1,6000])
fig.show()
# We would now display the results of the energy situation for the latest year i.e. 2021.
import plotly.express as px
query_table = pd.read_sql_query(
'''
SELECT
meta_generation.`Country`,
MAX(`Year`) AS Latest_Year,
(`Nuclear` + `Hydro` + `Wind` + `Solar`) AS Clean_generation,
(`Coal` + `Gas` + `Oil`) AS Dirty_generation,
`Total`,
`Clean_Percentage`,
`Dirty_Percentage`,
RANK() OVER(ORDER BY Clean_generation DESC) AS Clean_Rank,
RANK() OVER(ORDER BY Dirty_generation DESC) AS Dirty_Rank,
RANK() OVER(ORDER BY Clean_Percentage DESC) AS Clean_Percentage_Rank,
RANK() OVER(ORDER BY Dirty_Percentage DESC) AS Dirty_Percentage_Rank
FROM
meta_generation
WHERE
`Country` IN (SELECT * FROM Countries) AND `YEAR` = 2021
GROUP BY
meta_generation.`Country`, `Year`
ORDER BY
`Total` DESC;
''', engine)
fig = px.bar(query_table, x="Country", y=[ "Dirty_generation", "Clean_generation"], text_auto= ".2s", title="Energy usage for year 2021")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.show()
# Disposing SQLAlchemy engine object.
engine.dispose()